Searching for text is an essential feature you will find in almost any application: your favourite web browser, your word processor, and your text editor. Even your CMS allows you to search for documents or text within them – so why is this feature missing in KTM Validation? Well, as long as your locators play along and find the information you were looking for or you’re facing just one or two-paged documents, you should be fine. But then, consider the following use case: you’re processing loan contracts. The typical contract consists out of 80 pages, and you use a trainable group locator and possibly the text content locator to identify essential information such as the borrower, the agent, and the total amount lent.
What if the locators fail to find the borrower? Will you browse for their address and names, page by page? You see, a search function can become quite handy sometimes. Fear no longer, here’s how you could implement one. As always, here’s the teaser showing you how it could look like:
In the example above, I searched for the text “server scheduler”. The search results are displayed in a table, along with the page number. Clicking on a row will take us directly to the page where the text was found. Here’s how it works.
Finding words
Finding words in an xDocument is rather easy. A collection of CscXDocWords is present in the representation, for example pXDoc.Representations.ItemByName(“PDFTEXT”), given you’re provided PDF files with text. The following function returns a list of Words objects – that’s right, Words – as one could search for multiple words as shown in the screenshot above (i.e. “server” followed by “scheduler”).
Private Function Words_Search(pXDoc As CscXDocument, searchString As String) As Object
' searches a given xdoc for the search string, returns all words found
Dim i As Long, h As Long
Dim oWordsFound As Object
Dim searchStrings() As String
Dim phraseMatched As Boolean
Dim finalWord As CscXDocWord
Set oWordsFound = CreateObject("System.Collections.ArrayList")
'it's entirely possible the user enters multiple words, e.g. "john doe" - the simple solution is to search for those two words in conjunction
searchStrings = Split(searchString)
' when there are more words, we need to make sure not to get out of range
For i = 0 To pXDoc.Representations(0).Words.Count - (UBound(searchStrings) + 1)
' feel free to use levenshtein, case-insensitive, etc.
phraseMatched = False
' see if the first word matches
If UCase(pXDoc.Words(i).Text) = UCase(searchStrings(0)) Then
phraseMatched = True
' now see if the subsequent words match, as well (if there are any)
For h = 1 To UBound(searchStrings)
If UCase(pXDoc.Words(i + h).Text) <> UCase(searchStrings(h)) Then
phraseMatched = False
End If
Next
End If
' finally, add the phrase if found
If phraseMatched Then
oWordsFound.Add(New CscXDocWords)
For h = 0 To UBound(searchStrings)
oWordsFound(oWordsFound.Count - 1).Append(pXDoc.Words(i+h))
Next h
End If
Next
Return oWordsFound
End Function
The table
After considering other alternatives a table seemed perfectly suited for displaying the search results. For example, it provides one with immediate feedback when clicking on a search alternative, and all matches are highlighted on the document viewer. I also tried storing the search alternative in a temporary variable, allowing the user to switch back and forth between the results (something similar to the search function you’re familiar with in your browser), but the major issue here was that a fields’ coordinates are only refreshed when one would leave and then re-enter the search field associated with that variable. That’s rubbish. So, I added a table model consisting of two columns: one for the search text (called “Text”), and one for the page where the text was found at (called “p.”). Then, I wrote a sub that would populate a table field called “SearchResults”:
Private Sub Table_Populate(pXDoc As CscXDocument)
Dim tblField As CscXDocField
Dim i As Long, h As Long
Set tblField = pXDoc.Fields.ItemByName("SearchResults")
' first, clear the table
For i = tblField.Table.Rows.Count - 1 To 0 Step -1
tblField.Table.Rows.Remove(i)
Next
tblField.Table.QuickCreate(2, oSearchWords.Count)
' set the column names (the names MUST match the names from the table model, otherwise the table won't be propagated)
tblField.Table.Columns(0).Name = "Text"
tblField.Table.Columns(1).Name = "p."
For i = 0 To oSearchWords.Count - 1
' there are multiple words possible
For h = 0 To oSearchWords(i).Count - 1
tblField.Table.Rows(i).Cells(0).AddWordData(oSearchWords(i)(h))
tblField.Table.Rows(i).Cells(1).Text = CStr(oSearchWords(i)(0).PageIndex + 1)
Next
Next
End Sub
The Search Function
Finally, we can put all the puzzle pieces together: a separate field acts as the search field. Whenever its content is confirmed (i.e. the user hit enter), the search would be issued, populating the table as – long as there are some alternatives, of course. A global variable holds all the alternatives found.
Dim oSearchWords As Object
Private Sub ValidationForm_AfterFieldConfirmed(ByVal pXDoc As CASCADELib.CscXDocument, ByVal pField As CASCADELib.CscXDocField)
If pField.Name = "Search" Then
' the search field's value was changed, so perform the search and reposition the search index (to the first word found, if there is any)
'searchIndex = -1
Set oSearchWords = CreateObject("System.Collections.ArrayList")
Set oSearchWords = Words_Search(pXDoc, pField.Text)
If oSearchWords.Count > 0 Then
' indicate the search did not return anything by colouring the field label
ValidationForm.FieldLabels(0).SetForeColor(0, 180, 0)
Else
ValidationForm.FieldLabels(0).SetForeColor(180, 0, 0)
End If
' then, list all items into a table
ValidationForm.Tables(0).Visible = False
Table_Populate(pXDoc)
ValidationForm.Tables(0).Visible = True
End If
End Sub
The “Get Me Started Project”
Is available here, enjoy.