Grooper 2.90.0051 is now available! Check the  Downloads Discussion  for the release notes and to get the latest version.
The next version of Grooper - Grooper 2021 - will be entering beta soon! If you want to get a head start on some of our exciting new features, check out the  article over Smart PDFs  on the Grooper Wiki!

The Data Field is extracting a different value than the Data Type it is referencing.

I have an extractor that is finding a currency value using the FuzzyRegEx mode and the pattern of  \$(\*|\*\*|\*\*\*|\*\*\*\*|\*\*\*\*\*|\*\*\*\*\*\*)?(\d{1,3}(,|\.| ))?\d{1,3}(,|\.| )\d{2} with no look around of output format parameters.

This extractor is part of a key-value pair,

 the value extractor gets the correct value, as does the KVP level data type,

but when I run the extraction on the Data Field level it is highlighting the same correct value,

but returning only part of it - making it wrong (pic 4).
Kyle Souza
Data Wizard
P&P Oil & Gas Solutions

Best Answer

  • kylesouzakylesouza Posts: 147 ✭✭
    Accepted Answer
    I still think there is a problem with the extractors for the dev team to look into, for for now I have found a solution that is working for me; I changed the value pattern to include groups and set an output format.

    Kyle Souza
    Data Wizard
    P&P Oil & Gas Solutions

Answers

  • dearnerdearner Posts: 206 admin
    Kyle - out of curiosity, can you post the value type settings for those extractors (KV - Check Amount, Value, and Currency New), including any formatting specifiers?  I'm curious if it's getting clipped somehow to only two digits before the decimal.
  • kylesouzakylesouza Posts: 147 ✭✭
    Amount Being Paid


    KV - Check Amount


    Value


    Currency New


    Pattern

    Kyle Souza
    Data Wizard
    P&P Oil & Gas Solutions
  • dearnerdearner Posts: 206 admin
    edited January 2020
    Yeah, this is interesting.  So it looks like what's happening here is:
    • your extractor (pre-fix) is picking up "$211 52" for that value, which makes sense; the line is going right through the decimal, and is probably getting removed during IP, so it sees a space. 
    • That gets put through the .NET string formatting with a specifier of c2, defined on your field (there's some reference documentation on it at https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings). 
    • For some reason - and this is something we might want to look into - that is turning it into $52.00.  My guess is it only sees the 52 (which would format as $52.00 with a c2 format specifier); or it sees $211 and 52 separately, and only returns the latter string formatted.
    • That value populates the Grooper field.
    The best practice is the solution you've already found - to explicitly group the dollars and cents, and control the output formatting using the extractor.  This ensures consistent behavior and (again, as you've discovered) good normalization and formatting.
  • RandoCalrisianRandoCalrisian Posts: 182 mod
    I'd like to chime in.
    First off, I'm not really sure what is going on with those asterisks * and or pipes |, but I feel it's being done out of a similar situation as to one listed here:
    https://xchange.grooper.com/discussion/comment/1933#Comment_1933

    Second, the ...
    (,|\.|&nbsp;)<br>
    could be handled more cleanly with a character set...
    [,. ]<br>

    But, even this part could be cleaned up, and your overall solution more robust by using Fuzzy RegEx.


    Finally, the way the Format Specifier is functioning is correct. It will truncate that space and anything before it. While your work around is getting you there, I felt it necessary to hopefully show you an easier solution, that will work better more frequently, and hopefully get you thinking about Fuzzy RegEx and its uses more.
    Randall Kinard
    [email protected]

Sign In or Register to comment.