A better design for OMR extraction with KTM (Part 2)

Please note: this post is part of a series. If you did not read the previous part, please do so.

Applying OMRSingleChoiceText

The last time I wrote two methods that would translate a single choice item into an item within a dictionary. This time we will apply that method in a demo project to populate an index field. I will also show some design considerations to keep our project maintainable. The figure below shows the general idea:

A sub field of an Advanced Zone Locator contains the results of the single choice group
A dictionary contains the single choice items, including the three different errors that could occur
A function translates the sub field into the item from the dictionary
A script locator applies that function in order to generate one alternative
A field is mapped to that script locator
In validation design, the same dictionary is mapped to the field (restricted combo box items)

The good news: as everything was already present, I just had to put the pieces together. However, there are several ways to deal with multiple questions. My recommendation is to use one Script Locator per group that is later on linked to a field. Do not use the Document_AfterExtract event in order to hack your results into a field. Why? Because you’d lose a great bunch of handy functions. For example, a field will only be populated if the confidence exceeds a certain threshold. The field will only be valid if its confidence is above another threshold. If a locator is linked to a field, all that magic is done automatically. So, if you adjust the confidence of a locator alternative, it might be marked uncertain in validation or not copied to a field at all. If you’d use the AfterExtract event, you have to deal with that by yourself.

So, I would have one script locator per single choice question on my document. In my case I’d have three – all the other questions are multiple choice. The first is about the overall quality of a show, the second one about the country of origination, and the third one is a simple yes/no answer. The next picture shows my project setup:

There are three index fields, one per group
There are four locators – one per group, the last one to populate the others
There are three dictionaries

The fourth locator (SL_MapAll) is the only one which contains code. This one deals with the mappings from the AZL sub field to the dictionary item. Of course you could call the ZoneLocator_MapSubFieldAndDictionary in every LocateAlternatives-event of each script locator, but that would result in more typing. As I’m quite lazy when it comes to that, I let one locator do the mapping. In order for that to work, I just need to make sure that a certain Locator-Field-Dictionary combination uses the same name (excluding the prefix SL_ or SF_):


' Maps all the subfields from an AZL to script locators and creates alternatives
Private Sub SL_MapAll_LocateAlternatives(ByVal pXDoc As CASCADELib.CscXDocument, ByVal pLocator As CASCADELib.CscXDocField)

Dim itemName As String
Dim scriptLocatorPrefix, subFieldPrefix, dictPrefix As String
Dim dictionaryName As String
Dim sourceField As CscXDocSubField
Dim tmpField, destinationField As CscXDocField

scriptLocatorPrefix = "SL_"
subFieldPrefix = "SF_"
dictPrefix = "Dict_"
' all 3 items must have the same name: for example, there must be
' - a Script Locator, SL_Option
' - a Sub Field in the AZL called SF_Option
' - a Dictionary called Dict_Option

' loops through each item given. assumes that both script locators and subfields use the same name!
For Each itemName In Split("ShowQuality,OriginCountry,SpeakerOpinion",",")
' source = the subfield from the AZL, e.g. "0|0|1|0"
' destination = the target locator where an alternative will be created
' dictionary = the dict which holds the items to be used, e.g. "3: disagree"
Set sourceField = pXDoc.Locators.ItemByName("AZL").Alternatives(0).SubFields.ItemByName(subFieldPrefix & itemName)
Set destinationField = pXDoc.Locators.ItemByName(scriptLocatorPrefix & itemName)
dictionaryName = dictPrefix & itemName
Set tmpField = ZoneLocator_MapSubFieldAndDictionary(pXDoc, sourceField, dictionaryName)
' create a new alternative
destinationField.Alternatives.Create
FieldAlternative_Copy(tmpField.Alternatives(0), destinationField.Alternatives(0))
' re-adjust confidence so that invalids, uncertain or non-selected items appear red in validation
Next

End Sub

So, every time I’d decide to add a new group, I’d just have to add the item name excluding the prefix to the following line:


For Each itemName In Split("ShowQuality,OriginCountry,SpeakerOpinion,MyNewItemGoesHere",",")

One problem remained, however: as even invalid items such as multiple selections in a single choice field would be mapped into one (1) valid dictionary entry, the user would not be prompted with that issue in validation, as the following picture shows:

However, some adjustments to the ZoneLocator_MapSubFieldAndDictionary function would make sure that these items would appear as invalid in validation:

The Scripts

ZoneLocator_MapSubFieldAndDictionary:

Maps the sub field from the Advanced Zone Locator with an item out of a dictionary

FieldAlternative_Copy:

Copies one FieldAlternative to another, including positions, content and confidence.

SL_MapAll_LocateAlternatives:

Maps all script locators, as described above.

Quipu Blog

Kofax Subject Matter Experts

A better design for OMR extraction with KTM (Part 2)

Applying OMRSingleChoiceText

The Scripts