Grooper 21.00.0082 is available as of 12-12-2023! Check the  Downloads Discussion  for the release notes and to get the latest version.
Grooper 23.00.0044 is available as of 06-20-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.1.0022 is available as of 07-05-2024! Check the  Downloads Discussion  for the release notes and to get the latest version.
Options

Is there a way to edit text that was recognized incorrectly?

Grooper added a space to recognized text. Is there a way to delete that space.

The number on the document was 111.12 but was recognized as 111.1 2
Tagged:

Best Answer

  • Options
    RandoCalrisianRandoCalrisian Posts: 195 admin
    Answer ✓
    Short answer is, not easily, no.
    What is the significance of that number? What are you doing with it?
    Can I assume you're extracting it somehow? If so, fuzzy logic would easily remove that space.
    Randall Kinard
    rkinard@bisok.com

Answers

  • Options
    AstonAston Posts: 17
    It's part of an ordered array, so it was looking for a number with two decimal places, so wasn't picking up the line. I changed the "Native Text Extraction" setting in the recognize step and it was able to read it correctly.

    Thanks
  • Options
    RandoCalrisianRandoCalrisian Posts: 195 admin
    Can I assume this was an electronic document that was being OCRed, and the Native Text Extraction is now giving you the proper text instead of the error OCR made?
    If you'd like a solution involving Fuzzy Logic please let me know.
    Randall Kinard
    rkinard@bisok.com

  • Options
    AstonAston Posts: 17
    Yes. Some of the electronic documents we have been getting recently were created where a lot of the text are made to be images or something, so we're just OCRing everything right now.

    Would the fuzzy logic be set up in the extractor by using fuzzyregex?
  • Options
    RandoCalrisianRandoCalrisian Posts: 195 admin
    Yes.
    Your expression might look something like (keep in mind, you can't use infinite quantifiers with Fuzzy on):
    [0-9]{3}[.][0-9]{2}

    From there (assuming you're in 2021) the Fuzzy Matching property would be Enabled. Then in your Weightings property you could use the following:
    Delete( )=0.25

    This would make deleting a space inexpensive so you could keep your Minimum Similarity high.
    Randall Kinard
    rkinard@bisok.com

Sign In or Register to comment.