Header Value Table Extraction End of Table

Is it possible to put a pattern to indicate end of table.  I have an Invoice that show all the line items and the Invoice also contain the packing slip which is a duplicate of all the line items.  Is there a way to tell the table to stop retrieving after it recognize a certain pattern?


  • vimalavimala Posts: 17
    @henryma You can use Footer extractor to indicate end of the table. Footer extractor is an optional extractor which matches content immediately below the last row of information in the table. 

  • henrymahenryma Posts: 56
    I try using the footer extractor; however the last row of information doesn't have anything unique for me to pattern to indicate end of table.  So I want to use the Total Invoice Amount as an indicator; put it doesn't seem to pick that up, since is not part of the table information.  So if the table rows continue for four pages; the Total Invoice will show at the end of page four or could end up in page 5.  However is not part of the header column values.
  • vimalavimala Posts: 17
    @henryma can you please attach document for reference?
  • GrooperGuruGrooperGuru Posts: 417 admin
    Yes, if you make a footer extractor that finds the label for the Total Invoice Amount, and that label comes AFTER the last line item you want to capture, that should work. We may need to see a screenshot to further assist.
    Matt Harrison
    Director of Strategy
    [email protected]
  • henrymahenryma Posts: 56
    I try adding the footer; but for some reason it doesn't stop there. 

    I try using Item Subtotal and then Invoice Amount.

    Part of OCR:

    Sort Seq: Order Confirmation<\r><\n>
    Item Material Description Qty UM Unit Price Disc% Net Price Net Total<\r><\n>
    10 1205163 SHIM FLAT SHEET 12 IN WIDTH 24 IN LENGTH 0.031 IN THK 302 1 EA 635.45 /PAI 36.20% 255.83 255.83<\r><\n>
    STEEL PRECISION BRAND 22993<\r><\n>
    20 1062772 SCREW CONCRETE STEEL 1/4 IN - 1-1/4 IN SLOTTED HEX TAPCON 24315 2 EA 0.40 /EA 0.40 0.80<\r><\n>
    Customer Mat. No. 000000000100025554<\r><\n>
    Item Subtotal.................<\r><\n>
    Invoice Amount................<\r><\n>

  • henrymahenryma Posts: 56
    The packing slip is a replica of the same items and look like an Invoice and is included on the same PDF; but always came after the original Invoice.
  • vimalavimala Posts: 17
    @henryma You can use  this pattern "[\n]Item\s*Subtotal[^\f]+", here "\f " indicates end of the page.
    Hope this will help you!
  • henrymahenryma Posts: 56
    Pattern retrieved all the data from that point to end of table; however it did not stop the table extraction.  It still went and extract the duplicate Invoice line detail several pages later.
  • vimalavimala Posts: 17
    edited June 3
    @henryma can you please attach document with requirements? I can work around and I will try to give you proper solution.
  • henrymahenryma Posts: 56
    I see what the issue is; but not sure how I would resolve it.  The footer works, but problem is the duplicate Invoice in the pdf is causing it to create a new table again.  If I put line 230 as a footer on one Invoice then both table would show up from line 10 to 220.  My problem now is that I need to exclude the other pages from the table extraction; so that it doesn't create two tables for me.
Sign In or Register to comment.