Extract Table data when few lines of text between the Table Header & Data first Row

rameshramesh Posts: 6
edited June 19 in The Astronauts (Q&A)
Hi,
I am facing difficulty while extracting data using Table section Key Value Pair list where it contains few lines of text between the Header section & Start of the data row.
1. Header  rows comes only one time per page
2. Some text between Header & first Data Row(s)  (first set)
3. There will be some text between next set of Data Row(s)

Like below:

-----------------------------------------------
HCol1    HCol2    HCol3    HCol4    
-----------------------------------------------
Line 1
Line 2
Line 3

Data1    Data2    Data3    Data4    
Data1    Data2    Data3    Data4    
Data1    Data2    Data3    Data4    
Data1    Data2    Data3    Data4    

-----------------------------------------------
Footer Text
-----------------------------------------------

other text 1
other text 2
other text3

Data1    Data2    Data3    Data4    
Data1    Data2    Data3    Data4    
Data1    Data2    Data3    Data4    
Data1    Data2    Data3    Data4    

-----------------------------------------------
Footer Text
-----------------------------------------------

Answers

  • jclarkjclark Posts: 43 ✭✭
    Would it be possible to try using a section that includes the area from the headers to the last footer but does not include the headers or footer in the section, would be using a pattern based to exclude the headers and footer part of the section, then use row match for multiple tables, one for each different line of data? I have attached screen shots of something that may look like what you are referencing with the multiple tables within a single section. Let us know if this is helpful or if you need more information on how this may work with what you have.






  • rameshramesh Posts: 6

    Hi Clark,

    Thank you for the answer.

    As your said, I can group Claim & Service line section. (Service Line group section comes as a subset of Claim group section)
    I can also group each service line of data individual as reach row.

    I cannot use Location based extraction as sometimes while scanning the image the data may disposition.
    I tried using Key-Value pair extraction, Previous column value as the key to next column value with Horizontal flow enabled, but sometime for a row random column(s) value missing except first column value will be always be there for each data row. (Data table contains 10 columns)

    Sample data below (Sorry I cannot attach any sample image for your reference,) 

    -----------------------------------------------
    HCol1    HCol2    HCol3    HCol4    
    -----------------------------------------------
    Line 1
    Line 2
    Line 3

    Data1    Data2               Data4    
    Data1                            Data4    
    Data1    Data2    Data3             
    Data1    Data2               Data4    
    -----------------------------------------------
    Footer Text
    -----------------------------------------------

    other text 1
    other text 2
    other text  3

    Data1                  Data3    Data4    
    Data1    Data2    Data3             
    Data1    Data2                 Data4    
    Data1                 Data3        
    -----------------------------------------------
    Footer Text
    -----------------------------------------------
  • jclarkjclark Posts: 43 ✭✭
    Another option, which may work better for your situation, is to use a Key Value List as shown in the screenshots below where you set your table extract method to Header-Value, set the data column Header Extractor to the Header value above the data you want to collect, then set a field class with a Value Extractor that matches the column data that you want to extract and  set up context zones that will limit the area where the value extractor looks for the data. Attaching screenshots to give an example.
    Please let us know if this method resolves your issue.




  • rameshramesh Posts: 6
    edited June 24
    Hi Clark,

    Thank you for the help. I tried with above instruction, but I am unable to extract the data.

    I am attaching sample image, hope this will help you in understanding the format.

       1. Header comes only once at the beginning of the page, Header will not repeat for each table
       2. Page can contain multiple Table (As shown in the Picture).
       3. In a Table , a Row Only 1st, 2nd  & 3rd Columns will always have the Values. Other columns may not contain values for all the Columns (column values missed randomly), 
       4. Some times , a Row may expand to 2nd line also.
       5.  The distance between the Page Border & Row start will differ from image to image.
       6. I am using IP profile to clean up Table(lines), background color that row contains.


  • jclarkjclark Posts: 43 ✭✭
    What version of Grooper are you currently using?
    I do have examples of this Highmark Blueshield format for testing with on our side. I do know that our development team was working on issues with these types of formats for a future version update but I am not sure where that is at currently. I will look at the examples I have to see if I can give you a solution for this type of format with the current version of Grooper.
  • rameshramesh Posts: 6
    Thanks Clark, we are using the Grooper Design Studio version: 2.72.0022
  • jclarkjclark Posts: 43 ✭✭
    Hello,
    We are currently working on a Grooper Wiki Article to give detailed instructions for this format. We expect it to be completed early next week and will give an update when it is finished and ready for viewing.
  • rameshramesh Posts: 6
    Thank you,  waiting for the reply
  • jclarkjclark Posts: 43 ✭✭
    Hello,
    Please see the Grooper Wiki Article that should help explain the data collection issue you had asked about above.
    https://wiki.grooper.com/index.php?title=Row_Match_(Table_Extract_Method)#Use_Cases:_Deep_Dive
    Please let us know if you have any questions or comments.
  • rameshramesh Posts: 6
    Thank you for detailed explanation. 
    I will try the way to extract the data.
Sign In or Register to comment.