Grooper 21.00.0082 is available as of 12-12-2023! Check the  Downloads Discussion  for the release notes and to get the latest version.
Grooper 23.1.0016 is available as of 03-15-2024! Check the  Downloads Discussion  for the release notes and to get the latest version.
Grooper 23.00.0042 is available as of 03-22-2024! Check the Downloads Discussion for the release notes and to get the latest version.

Is there a way to edit text that was recognized incorrectly?

Grooper added a space to recognized text. Is there a way to delete that space.

The number on the document was 111.12 but was recognized as 111.1 2
Tagged:

Best Answer

  • RandoCalrisianRandoCalrisian Posts: 195 admin
    Answer ✓
    Short answer is, not easily, no.
    What is the significance of that number? What are you doing with it?
    Can I assume you're extracting it somehow? If so, fuzzy logic would easily remove that space.
    Randall Kinard
    rkinard@bisok.com

Answers

  • AstonAston Posts: 17
    It's part of an ordered array, so it was looking for a number with two decimal places, so wasn't picking up the line. I changed the "Native Text Extraction" setting in the recognize step and it was able to read it correctly.

    Thanks
  • RandoCalrisianRandoCalrisian Posts: 195 admin
    Can I assume this was an electronic document that was being OCRed, and the Native Text Extraction is now giving you the proper text instead of the error OCR made?
    If you'd like a solution involving Fuzzy Logic please let me know.
    Randall Kinard
    rkinard@bisok.com

  • AstonAston Posts: 17
    Yes. Some of the electronic documents we have been getting recently were created where a lot of the text are made to be images or something, so we're just OCRing everything right now.

    Would the fuzzy logic be set up in the extractor by using fuzzyregex?
  • RandoCalrisianRandoCalrisian Posts: 195 admin
    Yes.
    Your expression might look something like (keep in mind, you can't use infinite quantifiers with Fuzzy on):
    [0-9]{3}[.][0-9]{2}

    From there (assuming you're in 2021) the Fuzzy Matching property would be Enabled. Then in your Weightings property you could use the following:
    Delete( )=0.25

    This would make deleting a space inexpensive so you could keep your Minimum Similarity high.
    Randall Kinard
    rkinard@bisok.com

Sign In or Register to comment.