Usually, producing VRSed images requires either a document scanner, or some dirty tricks with a twain emulator. A few months ago I stumbled upon a very interesting method already built into Kofax Transformation Modules that allows us to binarize electronic documents! Well, while we always knew that it was there – when you use the advanced zone locator on a colour image, once you switch to the reference image you’ll see that there was some VRS magic – I was quite surprised that this method is public!
Let’s give it a shot. The method resides in the CscImgLib class and is called BinarizeWithVRS. It is called on a CcsImage object and returns that data type as well.
First let’s start with some samples. I first downloaded datasheets for both KTM and Analytics, two native PDF images with lots of text, but also some images. Naturally, they can be imported and used in KTM – so, the CscImage does not really care whether you use a TIFF or a PDF, meaning that conversion will also work on all accepted formats.
So, those are pretty good examples. We’ve got blue on blue, diagrams, screenshots – VRS can shine on them. Here’s the result:
I won’t go into too much detail, as you will know what VRS can do for you. Let’s dig into the more interesting part: how to call the method? First, I wrote my own BinarizeDocumentWithVRS method that takes a file (as String) and returns a collection (of CscCollection, so multiple pages are possible as well).
Private Function BinarizeDocumentWithVRS(file As String) As CscCollection
Dim imgOriginal As CscCollection
Dim imgResult As New CscCollection
Dim tmpImage As New CscImage
Dim binarizedImg As CscImage
Dim i As Integer
' first: load all pages into a collection
Set imgOriginal = LoadImageAllPages(file)
For i = 0 To imgOriginal.Count - 1
tmpImage.Load(file, i)
Set binarizedImg = tmpImage.BinarizeWithVRS()
'binarizedImg.VRS_Despeckle(1000,1000)
binarizedImg.VRS_Filter(2)
imgResult.Add(binarizedImg, i)
Next
' now return the collection
Return imgResult
End Function
Private Function LoadImageAllPages(file As String) As CscCollection
Dim i As Integer
Dim tmpColl As New CscCollection
Dim tmpImg As New CscImage
While True
tmpImg.Load(file, i)
On Error GoTo EndOfPage
' add page to collection
tmpColl.Add(tmpImg, i)
i = i + 1
Wend
EndOfPage:
Return tmpColl
End Function
Note that there are some properties that are – to my surprise – quite well documented in the scripting object reference! For example, VRS_Filter allows you to fine-tune some VRS settings:
- 0 = char smooth / strong neighbor
- 1 = thinning / erosion
- 2 = thicken / dilation
- 3 = smooth+clean / opening
- 4 = fill line breaks / closing
- 5 = smooth+clean+preserve / openplus
- 6 = fill breaks + preserve / closeplus
- 7 = light thicken / dilate2x2
- 8 = outline
Secondly, the next step was to use that method. I decided to put it into the Batch_Open event; thus I can use it right out of Project Builder to generate binarized documents out of the currently selected document set!
Private Sub Batch_Open(ByVal pXRootFolder As CASCADELib.CscXFolder)
Dim x As New CscCollection
Dim i As Integer
Dim outputPath As String
Dim tmpxDoc As New CscXDocument
' converted images will be saved here
outputPath = "C:\TIFF\Converted\"
Debug.Clear
' loop over all xdocs
For i = 0 To pXRootFolder.DocInfos.Count - 1
' will work for 1-paged files only
Debug.Print(pXRootFolder.DocInfos(i).XDocument.CDoc.SourceFiles(0).FileName + " --> " + outputPath & i & ".tif")
Set x = BinarizeDocumentWithVRS(pXRootFolder.DocInfos(i).XDocument.CDoc.SourceFiles(0).FileName)
' so we only get 1 page back then
x(1).Save(outputPath & i & ".tif")
Next
Debug.Print("done!")
End Sub
There you go, we just turned KTM Project Builder into a virtual, VRS-powered scanner! (Sidenote: yes, I was lazy and did take care of single page files only).