Grooper 22.00.0020 is available as of 2-23-2023! Check the Downloads Discussion for information on new features, and to download the latest build!
Grooper 23.00.0037 is available as of 11-17-2023! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 21.00.0080 is available as of 11-21-2023! Check the Downloads Discussion for the release notes and to get the latest version.
OCR issue with Color format
Is there a reason why color caused issue with OCR? Even though the font is readable, OCR have issue with reading it. If I binarize the image then it works fine; however there are time when image have highlighting over the text which I can read from the image, but binarize will cause issue. Is there a reason why color shading and the correct font color to make it readable to a human, but OCR cant read it?
Example of original image value that OCR have issue reading




But if I binarize it then it will read it

problem with binarize is if there is highlighting it will be an issue.


Example of original image value that OCR have issue reading

But if I binarize it then it will read it

problem with binarize is if there is highlighting it will be an issue.


0
Answers
Any time you pass a color or grayscale image to the OCR engine and your OCR's IP Profile does not perform binarization, the OCR engine will perform "Simple" thresholding to the image behind the scenes. Simple thresholding has a very high likelihood of destroying the highlighted text as you've observed in your second screenshot. The only way to prevent this is to perform Binarization in an IP Profile. It is very likely that Simple or Auto binarization will have issues. But if you try Adaptive or Dynamic Thresholding, the black text within the yellow zone should come forward in a highly legible way.
Product Manager
[email protected]