Grooper 23.00.0020 is available as of 5-15-2023! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 21.00.0073 is available as of 5-19-2023! Check the Downloads Discussion for the release notes and to get the latest version.
Database validation/correction
We have some form with dependent values field.
Take the following example:
Loan Number: 123486
First Name: John
Last Name: Smith
We have one example where the loan number above was read by Grooper as "123456". When I look at the OCR character results the "8" that was read as a "5" was 99% confident according to the character OCR data. The "5" was in fact the most confident OCR character in the field.
I now find myself looking for ways to validate the data. We have a repository that I can user to validate the data -- most of the time. The interesting thing in this case is that we did in fact have a "Loan Number" with the value "123456", so a straight lookup would have validated the number as good. The process should therefore look something like this:
Validate that the Loan Number exists.
If it does then check out the name associated with the loan. If the exact values for name that we read off the form exist in the repository then is all good.
If they do not then use a "fuzzy matching of strings" and if we have name values that are within a 90% match (or some limit) . If they are then correct the names to the repository values.
If the Loan Number does not exist then try to either a close Loan Number match in the repository or a close match of the name fields and correct loan number/name accordingly.
One way to do this all at once might be to retrieve all concatenated Loan Numbers, First Name, Last Name from the repository and find the closest fuzzy match in the DB. The repository is not huge, so this approach seems feasible.
Repository: 123486|John|Smith
Read from Form: 123456|Johh|Smith
The two strings have a Levenshtein distance of 88.24%, so they are a relatively close match and probably what we are looking for. In our case the name was actually read 100% correctly, so the match would have been 94.12%.
https://www.cuelogic.com/blog/the-levenshtein-algorithm
Questions:
Take the following example:
Loan Number: 123486
First Name: John
Last Name: Smith
We have one example where the loan number above was read by Grooper as "123456". When I look at the OCR character results the "8" that was read as a "5" was 99% confident according to the character OCR data. The "5" was in fact the most confident OCR character in the field.
I now find myself looking for ways to validate the data. We have a repository that I can user to validate the data -- most of the time. The interesting thing in this case is that we did in fact have a "Loan Number" with the value "123456", so a straight lookup would have validated the number as good. The process should therefore look something like this:
Validate that the Loan Number exists.
If it does then check out the name associated with the loan. If the exact values for name that we read off the form exist in the repository then is all good.
If they do not then use a "fuzzy matching of strings" and if we have name values that are within a 90% match (or some limit) . If they are then correct the names to the repository values.
If the Loan Number does not exist then try to either a close Loan Number match in the repository or a close match of the name fields and correct loan number/name accordingly.
One way to do this all at once might be to retrieve all concatenated Loan Numbers, First Name, Last Name from the repository and find the closest fuzzy match in the DB. The repository is not huge, so this approach seems feasible.
Repository: 123486|John|Smith
Read from Form: 123456|Johh|Smith
The two strings have a Levenshtein distance of 88.24%, so they are a relatively close match and probably what we are looking for. In our case the name was actually read 100% correctly, so the match would have been 94.12%.
https://www.cuelogic.com/blog/the-levenshtein-algorithm
Questions:
- Do I define the DB connection in Infrastructure->Data Connections can I access it from within my Grooper custom validation/correction script, or do I need to establish the connection in code from my script?
- Can I from within my custom lookup/validation script call the Grooper fuzzy matching to see if the string "John" is a close match to "Johh" rather than implementing the algorithm above?
- How can I define 'custom libraries' that I can call in my validation/correction code (aka .NET assemblies). I would want them to be automatically distributed with my Grooper configuration. I tried to add an Infrastructure->Object Libraries, but those do not seem to be in scope in my validation script.
0
Comments
Product Manager
[email protected]