Grooper 2.80.0043 is now available! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 2.90.0051 is now available! Check the Downloads Discussion for the release notes and to get the latest version.
The next version of Grooper - Grooper 2021 - will be entering beta soon! If you want to get a head start on some of our exciting new features, check out the article over Smart PDFs on the Grooper Wiki!
Ignoring certain table cell values (return blank)
We have a table on one of our forms. It looks like this:

As you can see the second and third columns expect a date in the form of MM/YYYY. Unfortunately, many that fill in the form choose to write something like 'N/A', 'NA' or '-' in the field. This causes validation issues as our Grooper configuration expects the date field. This is also what our downstream systems want. We export the data directly to those systems using Groopers XML export.
The table is pretty standard. It is defined as follows. This is an 'Infer Grid' table that does per cell OCR using the lines in the table.

The From column is defined as follows. I tried to put a value extractor in, but Grooper seems to ignore it when you are doing the cell level OCR.

I could add a new set of 'shadow columns' and hide the original columns. I appended an X after the shadow columns. The shadow columns are just calculated values like If(From="N/A","",From)

I can hide the original columns, so only the calculated columns show, but I don't like this solution because when you are in Data Review Grooper does not jump to the extract location in the document because the value is calculated.
Any ideas on how I can replace the 'filler values' with blank values for the OCR read? It would be really easy if I was somehow going through an extractor after the cell level OCR.

As you can see the second and third columns expect a date in the form of MM/YYYY. Unfortunately, many that fill in the form choose to write something like 'N/A', 'NA' or '-' in the field. This causes validation issues as our Grooper configuration expects the date field. This is also what our downstream systems want. We export the data directly to those systems using Groopers XML export.
The table is pretty standard. It is defined as follows. This is an 'Infer Grid' table that does per cell OCR using the lines in the table.

The From column is defined as follows. I tried to put a value extractor in, but Grooper seems to ignore it when you are doing the cell level OCR.

I could add a new set of 'shadow columns' and hide the original columns. I appended an X after the shadow columns. The shadow columns are just calculated values like If(From="N/A","",From)

I can hide the original columns, so only the calculated columns show, but I don't like this solution because when you are in Data Review Grooper does not jump to the extract location in the document because the value is calculated.
Any ideas on how I can replace the 'filler values' with blank values for the OCR read? It would be really easy if I was somehow going through an extractor after the cell level OCR.
Tagged:
0
Answers
Director of Strategy
[email protected]
As I mentioned any 'phantom columns' I configure that are calculated or derivative using extractors are not user friendly, as when you are in Data Review Grooper does not jump to the extract location in the document.