Grooper 21.00.0082 is available as of 12-12-2023! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.00.0044 is available as of 06-20-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.1.0026 is available as of 09-16-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 24.0.0012 is available as of 10-10-2024! Check the Downloads Discussion for the release notes and to get the latest version.
I am having trouble exporting to a text-searchable PDF.
kylesouza
Posts: 156 ✭✭✭
I have set up a very simple process to ocr some PDFs:
I want the outputted PDF to be text searchable, so I have these settings:
But when I look at the exported document there is no selectable text. (see attached)
The Recognize process is generating text.
What am I doing wrong?
I want the outputted PDF to be text searchable, so I have these settings:
But when I look at the exported document there is no selectable text. (see attached)
The Recognize process is generating text.
What am I doing wrong?
Kyle Souza
Data Wizard
P&P Oil & Gas Solutions
Data Wizard
P&P Oil & Gas Solutions
Tagged:
0
Best Answer
-
GrooperGuru Posts: 481 adminTry also setting PDF Page Source to Image.Matt Harrison
Product Manager
mharrison@bisok.com5
Answers
To fix this, you can do one of two things: you can turn on "prefer child versions" on the PDF options, or you can run a content action -> clear content at folder level 1 to remove the imported PDF version and turn the folder into a simple container.
Let me know if this gets you where you need to go once you've tried it!
The "Clear Content" option fixed 1/3 of the issue (not the missing 1/3 from above), but causes two other issue.
Data Wizard
P&P Oil & Gas Solutions
What additional problems does running "clear content" create?
The Clear Content option is only creating one document even though it is processing two (out of three) of them. And, the output file has no name, because I am using the native file name to name the output file, so the two outputted files are probably writing over each other, but one file is erroring.
Data Wizard
P&P Oil & Gas Solutions
On the single file that's exporting incorrectly when you have "prefer child versions" turned on: does it behave differently when you run just one file through at a time? Same question for the "clear content" method: if you clear the content right after you split, does the OCR generate correctly? If so, then it's just a question of changing your filename expression to reference the link.
results in two files exporting with searchable text, and the same one without it.
I can see the text in Grooper for it though:
Running just the "problem page" through by itself has the same results.
-------------------
Using "Clear Content" has the same end result:
But the output file has no serachable text.
Data Wizard
P&P Oil & Gas Solutions
Data Wizard
P&P Oil & Gas Solutions