Saturday, September 6, 2014

How to convert a Word .doc file to a nice text file

"We received a Word.doc file from the transcriptionist, how can we use that in our legal software?"  Or worse "We go to trial on Monday and only have a Word .doc file, sync this!"

Have you ever run into this before?

We all know that exporting a Word .doc file does not always get us what we want for use in our legal software.

Recently a friend came to me with the questions above and asked what I could do.

The following write-up provides two possible solutions.

Use Words Save As process to save the files as “Plain Text (*.txt)”.  During the process choose to “Insert line breaks”, I left the other options to default.

QC this text file real good, ensure the line numbering is correct; always 1-25, no line number duplication, no missing line numbers, etc. It is MOST important for any automated process that the data going in is consistent.  GI-GO applies.

So now you have a nice text file, every line with a consistent line number but no page numbers.

Here you have two options.

One, redo the entire line numbering and insert both line numbers and page numbers appropriately.

For this option, we first need to remove all the existing line numbers and blank spaces before the text.

(Most text editors (NotePad++, TextPad, etc.) allow you to select a column of text using the Alt key and highlight process.  (I just tested it on an email and in Word; and holding the Alt key while selecting vertical text – selected a column of text.))

(You could also use a search and replace process to do this.)

Once you have all the line numbers and leading spaces removed, use the Page and Line utility that we include with the Auto-Syncer installation.  In the c:\Program Files (x86)\Visionary\Utilities directory is a program called – oddly enough - PageLineUtil.exe.

Launch that program; set the Source File, Destination File and the Page information and click the Build New File button.

Voila! You have a nicely formatted text file with line and page numbers.

Two, use a text editor that supports Regular Expressions.  (I use TextPad, as it supports the “\i” increment parameter.)

Open the text file and use Search/Replace.
Search for (without the quotes) “(\r\n1 )”
The opening and closing parens create a group; the “\r” and “\n” look for the carriage return and line feed characters; the “1 ” (the digit one with a trailing space) matches line number one only (not 10, 11, etc.).
Replace with (without the quotes) “\r\n.\r\nPAGE  \i\r\n.\1”
The “\r\n” create new lines; the “.” and “PAGE  “ adds that text as applicable; the “\i” adds an incrementing number; the trailing "\1" adds back in the group from the Search - the stuff within the parens.


Personally, I would use the search and replace process because I think RegEx are gnarly.

Whichever method you choose, the most important thing in the entire process is that the text files is correctly formatted.

So there you have it, two methods to easily use transcriptionist-type files.

As a service to non-Visionary software users, I am including a link to the Page and Line Utility program mentioned above.

All rights reserved, use at your own risk, blah, blah, blah....

Visionary Page and Line Utility.

Drop me a line at if you have questions.


No comments:

Post a Comment