Grooper 2.80.0043 is now available! Check the  Downloads Discussion for the release notes and to get the latest version.
Grooper 2.90.0051 is now available! Check the  Downloads Discussion  for the release notes and to get the latest version.
The next version of Grooper - Grooper 2021 - will be entering beta soon! If you want to get a head start on some of our exciting new features, check out the  article over Smart PDFs  on the Grooper Wiki!

Ignoring certain table cell values (return blank)

We have a table on one of our forms. It looks like this:



As you can see the second and third columns expect a date in the form of MM/YYYY. Unfortunately, many that fill in the form choose to write something like 'N/A', 'NA' or '-' in the field. This causes validation issues as our Grooper configuration expects the date field. This is also what our downstream systems want. We export the data directly to those systems using Groopers XML export.

The table is pretty standard. It is defined as follows. This is an 'Infer Grid' table that does per cell OCR using the lines in the table.


The From column is defined as follows. I tried to put a value extractor in, but Grooper seems to ignore it when you are doing the cell level OCR.


I could add a new set of 'shadow columns' and hide the original columns. I appended an X after the shadow columns. The shadow columns are just calculated values like If(From="N/A","",From)


I can hide the original columns, so only the calculated columns show, but I don't like this solution because when you are in Data Review Grooper does not jump to the extract location in the document because the value is calculated.

Any ideas on how I can replace the 'filler values' with blank values for the OCR read? It would be really easy if I was somehow going through an extractor after the cell level OCR.

Answers

  • GrooperGuruGrooperGuru Posts: 464 admin
    I would just run a value extractor on the original columns that only finds valid date formats. Essentially, it would return nothing if it can't find a "valid" date. Then set the default value for those columns to N/A.
    Matt Harrison
    Director of Strategy
    [email protected]
  • hjanumhjanum Posts: 86
    I would love to do that, but with the "Rubberband OCR Profile" feature used on this column, Grooper seems to ignore any Extractor I configure for the column at the top of the column definition. Any idea how to get an extractor involved when doing an "Infer Grid" type table?

    As I mentioned any 'phantom columns' I configure that are calculated or derivative using extractors are not user friendly, as when you are in Data Review Grooper does not jump to the extract location in the document.
Sign In or Register to comment.