Is there a way to edit text that was recognized incorrectly?

Aston · May 2021

Grooper added a space to recognized text. Is there a way to delete that space.

The number on the document was 111.12 but was recognized as 111.1 2

RandoCalrisian · May 2021

Short answer is, not easily, no.
What is the significance of that number? What are you doing with it?
Can I assume you're extracting it somehow? If so, fuzzy logic would easily remove that space.

Aston · May 2021

It's part of an ordered array, so it was looking for a number with two decimal places, so wasn't picking up the line. I changed the "Native Text Extraction" setting in the recognize step and it was able to read it correctly.

Thanks

RandoCalrisian · May 2021

Can I assume this was an electronic document that was being OCRed, and the Native Text Extraction is now giving you the proper text instead of the error OCR made?
If you'd like a solution involving Fuzzy Logic please let me know.

Aston · May 2021

Yes. Some of the electronic documents we have been getting recently were created where a lot of the text are made to be images or something, so we're just OCRing everything right now.

Would the fuzzy logic be set up in the extractor by using fuzzyregex?

RandoCalrisian · May 2021

Yes.
Your expression might look something like (keep in mind, you can't use infinite quantifiers with Fuzzy on):

[0-9]{3}[.][0-9]{2}

From there (assuming you're in 2021) the Fuzzy Matching property would be Enabled. Then in your Weightings property you could use the following:

Delete( )=0.25

This would make deleting a space inexpensive so you could keep your Minimum Similarity high.

Is there a way to edit text that was recognized incorrectly?

Best Answer

Answers