Grooper 21.00.0082 is available as of 12-12-2023! Check the  Downloads Discussion  for the release notes and to get the latest version.
Grooper 23.00.0042 is available as of 03-22-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.1.0018 is available as of 04-15-2024! Check the  Downloads Discussion  for the release notes and to get the latest version.
Options

Dealing with whitespace in data types

tgarnetttgarnett Posts: 76 ✭✭✭
How would you approach creating your Data Types when you are running into situation where you
are getting extra white space?

For instance I am creating my address Data type and I am getting a lot of

P. O. Box 4 35

where there is a space in between the 4 and the 3.

Best Answer

Answers

  • Options
    GrooperGuruGrooperGuru Posts: 481 admin
    There are two schools of thought here. The first is to use text pre-processing on the format/pattern and remove control characters -> spaces. This effectively removes every space character from the document prior to the pattern running. This technique is useful for numbers, but causes issues with multiple word values like what you describe here.

    The "better" approach (starting in 2.6) would be to leverage Fuzzy RegEx mode on the format/pattern. The pattern would end up being something like:
    P[.] O[.] Box \d{1,4}

    As long as the Fuzzy Match percentage is reasonable, this technique should work well and will remove the additional space in the output.
    Matt Harrison
    Product Manager
    mharrison@bisok.com
  • Options
    RandoCalrisianRandoCalrisian Posts: 195 admin
    edited January 2018
    I can add to this, in order to get a better percentage for the FuzzyMatch method, throw in a LookAhead and LookBehind (if possible) to make the string longer, and as a result, make the Fuzzy Confidence higher (since FuzzyRegEx works on the entire pattern including the LookAhead and LookBehind.)
    Randall Kinard
    rkinard@bisok.com

  • Options
    GrooperGuruGrooperGuru Posts: 481 admin

    I can add to this, in order to get a better percentage for the FuzzyMatch method, throw in a LookAhead and LookBehind (if possible) to make the string longer, and as a result, make the Fuzzy Confidence higher (since FuzzyRegEx works on the entire pattern including the LookAhead and LookBehind.)

    Definitely agree with you here. Especially when dealing with short values like currency amounts. 3-5 character strings don't give fuzzy percentages enough headroom to be practical.
    Matt Harrison
    Product Manager
    mharrison@bisok.com
Sign In or Register to comment.