Grooper 22.00.0020 is available as of 2-23-2023! Check the  Downloads Discussion  for information on new features, and to download the latest build!
Grooper 23.00.0037 is available as of 11-17-2023! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 21.00.0080 is available as of 11-21-2023! Check the  Downloads Discussion  for the release notes and to get the latest version.

OCR issue with Color format

Is there a reason why color caused issue with OCR?  Even though the font is readable, OCR have issue with reading it.  If I binarize the image then it works fine; however there are time when image have highlighting over the text which I can read from the image, but binarize will cause issue.  Is there a reason why color shading and the correct font color to make it readable to a human, but OCR cant read it?

Example of original image value that OCR have issue reading

But if I binarize it then it will read it



problem with binarize is if there is highlighting it will be an issue.


Answers

  • GrooperGuruGrooperGuru Posts: 481 admin
    Thought I should shed some light on this.

    Any time you pass a color or grayscale image to the OCR engine and your OCR's IP Profile does not perform binarization, the OCR engine will perform "Simple" thresholding to the image behind the scenes. Simple thresholding has a very high likelihood of destroying the highlighted text as you've observed in your second screenshot. The only way to prevent this is to perform Binarization in an IP Profile. It is very likely that Simple or Auto binarization will have issues. But if you try Adaptive or Dynamic Thresholding, the black text within the yellow zone should come forward in a highly legible way. 
    Matt Harrison
    Product Manager
    [email protected]
Sign In or Register to comment.