Grooper 21.00.0082 is available as of 12-12-2023! Check the  Downloads Discussion  for the release notes and to get the latest version.
Grooper 23.00.0042 is available as of 03-22-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.1.0018 is available as of 04-15-2024! Check the  Downloads Discussion  for the release notes and to get the latest version.
Options

Database validation/correction

hjanumhjanum Posts: 110 ✭✭
We have some form with dependent values field.

Take the following example:
Loan Number: 123486
First Name: John
Last Name: Smith

We have one example where the loan number above was read by Grooper as "123456". When I look at the OCR character results the "8" that was read as a "5" was 99% confident according to the character OCR data. The "5" was in fact the most confident OCR character in the field.

I now find myself looking for ways to validate the data. We have a repository that I can user to validate the data -- most of the time. The interesting thing in this case is that we did in fact have a "Loan Number" with the value "123456", so a straight lookup would have validated the number as good. The process should therefore look something like this:

Validate that the Loan Number exists. 
If it does then check out the name associated with the loan. If the exact values for name that we read off the form exist in the repository then is all good.
If they do not then use a "fuzzy matching of strings" and if we have name values that are within a 90% match (or some limit) . If they are then correct the names to the repository values.
If the Loan Number does not exist then try to either a close Loan Number match in the repository or a close match of the name fields and correct loan number/name accordingly.

One way to do this all at once might be to retrieve all concatenated Loan Numbers, First Name, Last Name from the repository and find the closest fuzzy match in the DB. The repository is not huge, so this approach seems feasible.

Repository: 123486|John|Smith
Read from Form: 123456|Johh|Smith

The two strings have a Levenshtein distance of 88.24%, so they are a relatively close match and probably what we are looking for. In our case the name was actually read 100% correctly, so the match would have been 94.12%.
https://www.cuelogic.com/blog/the-levenshtein-algorithm


Questions:
  1. Do I define the DB connection in Infrastructure->Data Connections can I access it from within my Grooper custom validation/correction script, or do I need to establish the connection in code from my script?
  2. Can I from within my custom lookup/validation script call the Grooper fuzzy matching to see if the string "John" is a close match to "Johh" rather than implementing the algorithm above?
  3. How can I define 'custom libraries' that I can call in my validation/correction code (aka .NET assemblies). I would want them to be automatically distributed with my Grooper configuration. I tried to add an Infrastructure->Object Libraries, but those do not seem to be in scope in my validation script.

Comments

  • Options
    GrooperGuruGrooperGuru Posts: 481 admin
    There is a new feature coming to Grooper that allows for fuzzy database lookups involving multi-field matching. From the early prototype I have seen, it appears to be designed specifically to solve this kind of problem. Have you been spying on our dev team? LOL
    Matt Harrison
    Product Manager
    mharrison@bisok.com
Sign In or Register to comment.