Grooper 21.00.0082 is available as of 12-12-2023! Check the  Downloads Discussion  for the release notes and to get the latest version.
Grooper 23.00.0042 is available as of 03-22-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.1.0018 is available as of 04-15-2024! Check the  Downloads Discussion  for the release notes and to get the latest version.
Options

Recognition incorrect text issue

Hi All
while i am trying to process pdf or excel formates after recognize text is mismatch on some cases,like 'S' become 5 or G become 6. how to rectify this issues any techinque is available to rectify this issue to extracte text correct without mismatch once find the below screen shot  and please help us.
Thanks in Advanced.

Answers

  • Options
    tgarnetttgarnett Posts: 76 ✭✭✭
    Hi @Prasadchitikela,

    First, I just want to check - since you're dealing with PDFs, do these files already have native text? If the original PDF allows you to highlight text like this, you can use Native Text Extraction instead of an OCR Profile to get perfect results.



    If we do need to rely on OCR, there are a lot of ways we can try to improve the results but they will never be perfect. A good place to start would be the IP Profile attached to your OCR profile. Go to this IP profile and make sure the text on the black & white image it creates looks nice and clean. This looks like a nice clean image though, so I don't think you will find many issues. I've attached a generic OCR profile of mine. You can import it and see if it gets better results.

    If we can't get the OCR to read the letters properly, you can still fix this at Extract. A pattern looking for 3 digits - 2 digits and a letter will not pick up 111-345, but if you turn on Fuzzy matching with weightings applied, it will correct the value.

    In the example below, my pattern is  \d{3}-\d{2}[A-Z]. By applying the Fuzzy Match Weightings lexicon, it gets the correct result with 96% confidence.



Sign In or Register to comment.