Ignoring certain table cell values (return blank)

hjanum · February 2021

We have a table on one of our forms. It looks like this:

Image: https://us.v-cdn.net/6030453/uploads/editor/3i/nsxjk1mhu5w8.png

As you can see the second and third columns expect a date in the form of MM/YYYY. Unfortunately, many that fill in the form choose to write something like 'N/A', 'NA' or '-' in the field. This causes validation issues as our Grooper configuration expects the date field. This is also what our downstream systems want. We export the data directly to those systems using Groopers XML export.

The table is pretty standard. It is defined as follows. This is an 'Infer Grid' table that does per cell OCR using the lines in the table.

The From column is defined as follows. I tried to put a value extractor in, but Grooper seems to ignore it when you are doing the cell level OCR.

Image: https://us.v-cdn.net/6030453/uploads/editor/o1/yaqj2ixazejt.png

I could add a new set of 'shadow columns' and hide the original columns. I appended an X after the shadow columns. The shadow columns are just calculated values like If(From="N/A","",From)

Image: https://us.v-cdn.net/6030453/uploads/editor/of/qjlsm3tbts5g.png

I can hide the original columns, so only the calculated columns show, but I don't like this solution because when you are in Data Review Grooper does not jump to the extract location in the document because the value is calculated.

Any ideas on how I can replace the 'filler values' with blank values for the OCR read? It would be really easy if I was somehow going through an extractor after the cell level OCR.

GrooperGuru · February 2021

I would just run a value extractor on the original columns that only finds valid date formats. Essentially, it would return nothing if it can't find a "valid" date. Then set the default value for those columns to N/A.

hjanum · February 2021

I would love to do that, but with the "Rubberband OCR Profile" feature used on this column, Grooper seems to ignore any Extractor I configure for the column at the top of the column definition. Any idea how to get an extractor involved when doing an "Infer Grid" type table?

As I mentioned any 'phantom columns' I configure that are calculated or derivative using extractors are not user friendly, as when you are in Data Review Grooper does not jump to the extract location in the document.

Ignoring certain table cell values (return blank)

Answers