Grooper 21.00.0082 is available as of 12-12-2023! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.00.0044 is available as of 06-20-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.1.0026 is available as of 09-16-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 24.0.0013 is available as of 10-04-2024! Check the Downloads Discussion for the release notes and to get the latest version.
The Data Field is extracting a different value than the Data Type it is referencing.
kylesouza
Posts: 156 ✭✭✭
I have an extractor that is finding a currency value using the FuzzyRegEx mode and the pattern of
\$(\*|\*\*|\*\*\*|\*\*\*\*|\*\*\*\*\*|\*\*\*\*\*\*)?(\d{1,3}(,|\.| ))?\d{1,3}(,|\.| )\d{2} with no look around of output format parameters.
This extractor is part of a key-value pair,
the value extractor gets the correct value, as does the KVP level data type,
but when I run the extraction on the Data Field level it is highlighting the same correct value,
but returning only part of it - making it wrong (pic 4).
This extractor is part of a key-value pair,
the value extractor gets the correct value, as does the KVP level data type,
but when I run the extraction on the Data Field level it is highlighting the same correct value,
but returning only part of it - making it wrong (pic 4).
Kyle Souza
Data Wizard
P&P Oil & Gas Solutions
Data Wizard
P&P Oil & Gas Solutions
Tagged:
0
Best Answer
-
kylesouza Posts: 156 ✭✭✭I still think there is a problem with the extractors for the dev team to look into, for for now I have found a solution that is working for me; I changed the value pattern to include groups and set an output format.
Kyle Souza
Data Wizard
P&P Oil & Gas Solutions0
Answers
KV - Check Amount
Value
Currency New
Pattern
Data Wizard
P&P Oil & Gas Solutions
- your extractor (pre-fix) is picking up "$211 52" for that value, which makes sense; the line is going right through the decimal, and is probably getting removed during IP, so it sees a space.
- That gets put through the .NET string formatting with a specifier of c2, defined on your field (there's some reference documentation on it at https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings).
- For some reason - and this is something we might want to look into - that is turning it into $52.00. My guess is it only sees the 52 (which would format as $52.00 with a c2 format specifier); or it sees $211 and 52 separately, and only returns the latter string formatted.
- That value populates the Grooper field.
The best practice is the solution you've already found - to explicitly group the dollars and cents, and control the output formatting using the extractor. This ensures consistent behavior and (again, as you've discovered) good normalization and formatting.First off, I'm not really sure what is going on with those asterisks * and or pipes |, but I feel it's being done out of a similar situation as to one listed here:
https://xchange.grooper.com/discussion/comment/1933#Comment_1933
Second, the ...
(,|\.| )<br>
could be handled more cleanly with a character set...[,. ]<br>
But, even this part could be cleaned up, and your overall solution more robust by using Fuzzy RegEx.
Finally, the way the Format Specifier is functioning is correct. It will truncate that space and anything before it. While your work around is getting you there, I felt it necessary to hopefully show you an easier solution, that will work better more frequently, and hopefully get you thinking about Fuzzy RegEx and its uses more.
rkinard@bisok.com