Reminder! Until the end of the month, our Learning Track over Advanced Normalization is unlocked and available to all Grooper xChange users!
Grooper 2.80.0043 is now available! Check the  Downloads Discussion for the release notes and to get the latest version.
Grooper 2.90.0051 is now available! Check the  Downloads Discussion  for the release notes and to get the latest version.

Best Way To Extract Horizontal and Vertical for Single Field?

What is the best way to set up an extractor for a field when sometimes the label or feature is to the left of the value and on other forms, the label or feature is on top or over the value?

 Is there a way to set up an ordered array or something similar to pull the value based on the label on top but if another value is found with a label on the left use that?

 In my project, I have forms where the provider changes the formats frequently, but I usually have consistent labels/features or label options (e.g. First Name, FName, or FN all represent the field First_Name).

Tagged:

Answers

  • GrooperGuruGrooperGuru Posts: 465 admin
    edited February 2018
    There are two ways to accomplish this.

    Option 1

    The first is with the use of a Field Class that leverages Zonal context scope. Once set to zonal, you would want to configure 2 Context Zones... one above and one to the left.

    Option 2

    The second is with the use of a Data Type with the Ordered Array collation method. The Array Layout should be configured to accept both Horizontal and Vertical configurations.


    The Data Type will have two child formats or data types. The first will be for the label, which as you stated, will be on top or to the left. The second extractor will be for the value itself, which would be below or to the right.


    The final problem is now your output will contain both the field and the label, and you would only want to output the value into your field. You'll notice in the screenshot above, I specifically named the child extractors "Label" and "Value". When you configure your Data Field to use the Ordered Array extractor, there is also a property to specify a Sub-Element Name.  As long as you set this to the exact name of the pattern/data type of the value, that will be the only information placed into the field at run-time.

    Matt Harrison
    Director of Strategy
    [email protected]
  • GrooperGuruGrooperGuru Posts: 465 admin
    Both scenarios can work well, but I find the Field Class approach to be easier to configure. While not extremely common, OCR sometimes believes that two words next to each other are actually on different rows of text, even though they should not be. Because of this, the horizontal array method will break down. It relies on the value and label to actually be on the same line of OCR text. The Field Class is based on coordinates, measured in inches. Therefore, it won't suffer this same breakdown even when OCR makes a mistake on line synthesis.
    Matt Harrison
    Director of Strategy
    [email protected]
Sign In or Register to comment.