Monday, April 11, 2011

What about this Image Load File Process, Part One

"We have several bankers boxes of documents that need to be added to our case!"

"I just got a cd (or even worse, a dvd) from the opposing counsel with tens of thousands of images."

"We were just called in to take over from a previous firm and I found a drive full of images."

Whatever the situation, now you have to try to get all these documents into your case and go to trial.

What you need is an Image Load File.

An Image Load File allows you to populate your document database in an automated process, rather than adding one document at a time.

There are many different formats; however you will find that many so-called "image load files", are not actually Image Load Files.  Rather they are document load files or, even worse, a simple directory structure.

I am going to be discussing the most basic Image Load File (ILF), called an Opticon-type load file.  Ever scan shop should be able to produce this format.

First, let's take a look at a sample file.


Let's pull this apart and understand what is going on here.

Since we are loading a database, each line in this file represents one record in the Image Database.  Therefore there needs to be one line (record) for EVERY page of EVERY document.  Visionary needs to be able to find and display the image file (in this case, the .tif or .jpg files) for each page of each document.

We can see that there seems to be four fields in each record.  Each field is separated by a comma.  (This is generically called a CSV file, Comma Separated Values file.  Obviously, your data CANNOT contain any commas.)  In the case of an ILF, there is NEVER a need for the data to contain commas.

Database Key Field or Page ID value

The first field (201) must be unique within each case.  You cannot have two lines with "205" in the first field.  This is the "key" that the database uses to find a specific record or image.  Since it is the key field, every database will have specific constraints on it.

For Visionary, these are the constraints:

  • It must be less than 21 characters
  • It can only consist of letters
    • Lower, Mixed and Upper are treated the same; exhibit, Exhibit and EXHIBIT are the same.
  • Digits 0 through 9
    • Leading zero's are stripped
  • and these characters "+", ".", "-", and "_"

Volume Label

The second field (Box001) is a holdover from the days when hard drives were not large enough to hold all the data, and the CD Label was used to find the correct optical disk to use.  Now it refers to the directory within \vs_data\CaseID\Image directory.  It, obviously, does not need to be unique and will most likely change with each scan job iteration.  There can be multiple different entries in this field, even within one ILF.

Path To and Image File Name

The third field (00\01\exh001.001.tif) is where the actual image is located in relation to the Volume Label.  It does NOT contain drive letter information, such as "c:\".  So within the directory with the Volume Label name, there should be a directory named "00" and within that a directory named "01" and then a file named "exh001.001.tif"

First Page Indicator

The fourth field is a "Y" if this record is the first page of a document, otherwise it is left empty.

There are no other fields.


No comments:

Post a Comment