Grooper 21.00.0082 is available as of 12-12-2023! Check the  Downloads Discussion  for the release notes and to get the latest version.
Grooper 23.1.0016 is available as of 03-15-2024! Check the  Downloads Discussion  for the release notes and to get the latest version.
Grooper 23.00.0042 is available as of 03-22-2024! Check the Downloads Discussion for the release notes and to get the latest version.

OCR issue with Color format

Is there a reason why color caused issue with OCR?  Even though the font is readable, OCR have issue with reading it.  If I binarize the image then it works fine; however there are time when image have highlighting over the text which I can read from the image, but binarize will cause issue.  Is there a reason why color shading and the correct font color to make it readable to a human, but OCR cant read it?

Example of original image value that OCR have issue reading

But if I binarize it then it will read it



problem with binarize is if there is highlighting it will be an issue.


Answers

  • GrooperGuruGrooperGuru Posts: 481 admin
    Thought I should shed some light on this.

    Any time you pass a color or grayscale image to the OCR engine and your OCR's IP Profile does not perform binarization, the OCR engine will perform "Simple" thresholding to the image behind the scenes. Simple thresholding has a very high likelihood of destroying the highlighted text as you've observed in your second screenshot. The only way to prevent this is to perform Binarization in an IP Profile. It is very likely that Simple or Auto binarization will have issues. But if you try Adaptive or Dynamic Thresholding, the black text within the yellow zone should come forward in a highly legible way. 
    Matt Harrison
    Product Manager
    mharrison@bisok.com
Sign In or Register to comment.