Grooper 21.00.0082 is available as of 12-12-2023! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.00.0042 is available as of 03-22-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.1.0018 is available as of 04-15-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Options
Classification and Extraction Efficiency Revisited
Sdurbin
Posts: 13 ✭
Another couple quick questions about classification and extraction:
1. Are extraction results used during classification shared across document layouts? In other words, will classification be more efficient if the classification extractors use shared data types whenever possible?
2. Are extraction results used during classification reused during data extract? In other words, will extraction be more efficient if classification and extraction use shared data types whenever possible?
1. Are extraction results used during classification shared across document layouts? In other words, will classification be more efficient if the classification extractors use shared data types whenever possible?
2. Are extraction results used during classification reused during data extract? In other words, will extraction be more efficient if classification and extraction use shared data types whenever possible?
Tagged:
0
Best Answer
-
OptionsGrooperGuru Posts: 481 adminSo when each task is performed on a document, it is essentially a discrete workload. The real performance benefits come from caching, but there really isn't anything cached by one activity that is later reused by another. But within the context of extraction, if you create an extractor that is referenced 100 times throughout your model for that document, it isn't going to run the extractor 100 times. It runs once, and the results are cached in memory until the extraction is completed for that activity and Grooper moves on to the next. At that point, the cache is cleared and a new one is created for the next document.
On the classification side, I am about 95% sure that there isn't really any compute efficiency to be gained based on the reuse of common extractors. Though when performing rules based classification, the extraction stops as soon as the first rule hits. So that can often run pretty quickly depending on the overall number of rules and document types.Matt Harrison
Product Manager
mharrison@bisok.com5