Grooper 21.00.0082 is available as of 12-12-2023! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.00.0042 is available as of 03-22-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.1.0018 is available as of 04-15-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Options
What is the most efficient method of matching static text on a form?
Sdurbin
Posts: 13 ✭
When matching static text on a form to be used as anchors for data extraction, what is the most efficient method? I know we can use Data Formats with a pattern or Data Types with a pattern. Also, sometimes the static text will occur on the form multiple times to be used in multiple anchors. In that case, is it more efficient to create a single Data Format or Data Type to be referenced by the multiple anchor Data Types? Please let me know if examples are required or if you have any questions.
Tagged:
0
Best Answer
-
Optionsjclark Posts: 60 ✭✭✭I have found that creating extractors that can be referenced is much more efficient, but I would try to break them down to single words as much as possible in the extractor list. For example as you have above with "Amount" then a new extractor with "Amount Paid" I would simply have Amount and Paid as separate extractors and use an ordered array to combine multiple words by referencing each individual extractor. Referencing takes less time than extracting the same keyword each time you want to use it.
5
Answers
Here is an example of where text is repeated in column headers we are trying to identify separately.
In some cases, developers have created a single Data Type they use to identify the header text, then reference that Data Type as needed.
In other cases, developers have created new Data Types each with their own pattern to identify the header text.
In fact, it would also be good to know whether or not extraction used in a Classification step is reused in a subsequent Extraction step.
When we're doing hundreds if not thousands of classifications and extractions in a small period of time, even little bits of saved processing time can add up.
Thanks!