Field Class Extracted Value spans pages

strotelli · January 2018

I remember you showing us how to do this while we were in training in OK city, but I can't quite figure out the specifics of how to do this:

I want to select multiple extracted values of a Field Class (ex: Legal Description) which crosses over to another page and then right-click 'combine' so that it is considered one value and trained that way. Any help you could provide on this would be greatly appreciated.

p.s. Once I right-click combine extracted values, then train as positive on the combined value, it then spits the value back out into its parts. Does this sound
correct?
--
Thanks,

GrooperGuru · January 2018

There are a few different things necessary to do this. Sorry in advance for the length of what I'm about to type, but this is a pretty involved topic for newcomers. In the next few months, I plan to put out a video that covers this in depth.

Let's start with a discussion around the actual training process. So, you've set your value data type to find paragraphs. Then you've set the Field Class to examine the contents inside of each to determine its meaning and provide a confidence score. In the Field Class tester, there is a sort of "built-in" assumption that when you choose a candidate and train it as a positive candidate, all of the other candidates you did not choose must be bad. Therefore, these will all receive negative weightings during the training of the "Good" candidate. This is a problem when there are actually multiple correct answers on a document, as is sometimes the case in your multi-paragraph or page-spanning scenarios. So in the tester you should generally always select ALL correct choices using shift or ctrl+click and train them at the same time.

Once training is complete, you need to pay close attention to the scores of your "correct" candidates. You'll ultimately need to set the Minimum Confidence in the Field Class to a value that is high enough to remove EVERY candidate that is not what you plan to keep. At that point, you would have only the good paragraphs showing in the candidate list.

Next, you will set the collation method to combine instead of Individual. This will force Grooper to combine all remaining candidates together as a single result... even if they are on different pages or are not contiguous on the document.

The final setting (presumably) is to change your Instance Ranking from Confidence to Order. This ensures that the paragraphs are combined in order as they appear on the document, rather than based on their confidence scores. Without changing this, you could end up with a very unpredictable order to the paragraphs.

Again, stay tuned for that upcoming video.

RandoCalrisian · January 2018

@strotelli , do you have an example document you could share?

Field Class Extracted Value spans pages

Best Answer

Answers