Batch Process Creation
What is a Batch Process and why should I care?
Batch processes are the beginning, middle, and end of processing content in Grooper. Batch processes are responsible for taking your content and ingesting, manipulating, arranging, normalizing, and ultimately doing anything meaningful with the content you have.
Batch processes consist of a series of steps that contain activities. Each activity contains its own set of properties that can be customized to the content you are processing. Batch processes can be as simple as scanning in paper documents and then simply exporting them out to a common format, or they can be incredibly complex with multiple routing points based on different conditions.
When you break it down, there are 5 phases to think about when setting up a Grooper batch process. Thinking ahead to figure out exactly what you want to do with your content will benefit greatly in the long run.
Before any magic can happen, content must somehow make its way into Grooper. This could be done by scanning documents into Grooper, or by having an Import Watcher service setup that is constantly scraping a CMIS compliant ECM, a file share, email box, or an FTP/SFTP.
Example Activities: Scan
Once content has been acquired by Grooper, it is important that we set that content up for future success. Depending on how the content was captured, defects could have been introduced that can get in the way of future processing steps. To mitigate any problems, we need to make sure the content is in good shape.
Along with ‘cleaning’ the content, we need to have Grooper ‘read’ the content. This can be done via Optical Character Recognition, or if the content was born digitally it can be extracted from the file with 100% accuracy.
Example Activities: Image Processing, Full Text OCR, PDF Text Extract
Now with content brought in and prepared for processing, it is important to get it organized. To organize your content, we’ll need to identify it. This is important for future processes because Grooper tags the content with a content type that determines how each subsequent process step will interact with your content.
Example Activities: Classify Folders, Classification Review, Separation
Other than being put into neatly siloed groups, our content also contains information that makes them unique. The collect phase is where this meaningful information is extracted from the content and stored for later use, or put in front of a human operator for validation.
Example Activities: Extraction, Train Lexicon, Data Review
The final phase of batch processing in Grooper is delivery. This could involve taking files that were processed and moving them to an ultimate destination. It could also entail taking all the data that was extracted in the Collect phase and redacting the information from the final output, dumping it to a database, or populating a content management system. It all depends on your business need.
Example Activities: Database Export, Redaction, Document Export