Grooper 21.00.0082 is available as of 12-12-2023! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.00.0044 is available as of 06-20-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 23.1.0026 is available as of 09-16-2024! Check the Downloads Discussion for the release notes and to get the latest version.
Grooper 24.0.0012 is available as of 10-10-2024! Check the Downloads Discussion for the release notes and to get the latest version.
PDF scripting vulnerabilities
[Deleted User]
Posts: 0 admin
Received the question below from a client...
Do you guys have any experience dealing with cross site scripting vulnerabilities with PDFs? I would assume Grooper negates this by re-rendering all docs and converting them to a different file type. But we wondered if you guys have ever done a project where someone could upload a pdf to a public web portal before Grooper or a similar capture app would import it. In that scenario a malicious user could attack the public web portal with a malicious PDF and use it as an entry point to other things.
We were just brain storming and don’t have any problem we were trying to solve, but our security folks look to BIS as the document experts and wondered what your thoughts on this might be. I’m not looking for anyone to spend hours thinking about this, but if you had any best practices or quick thoughts on the topic we would love to hear them.
Do you guys have any experience dealing with cross site scripting vulnerabilities with PDFs? I would assume Grooper negates this by re-rendering all docs and converting them to a different file type. But we wondered if you guys have ever done a project where someone could upload a pdf to a public web portal before Grooper or a similar capture app would import it. In that scenario a malicious user could attack the public web portal with a malicious PDF and use it as an entry point to other things.
We were just brain storming and don’t have any problem we were trying to solve, but our security folks look to BIS as the document experts and wondered what your thoughts on this might be. I’m not looking for anyone to spend hours thinking about this, but if you had any best practices or quick thoughts on the topic we would love to hear them.
Tagged:
0
Best Answer
-
GrooperGuru Posts: 481 adminThis is a great question. You are correct in your assumptions.
The simple answer is that Grooper can help reduce these types of risks. Grooper is known to produce exceptionally good results with OCR. On ingested files that are originally electronic documents like Text-Based PDF, Word, Excel, Etc. Grooper does not need to perform traditional OCR. It directly extracts 100% accurate text from the original document. Then for image-based files like .Tif and .Jpg Grooper will use several of our patent-pending features to produce OCR results better than any other platform we have seen. Better text results in better classification/extraction of data and the potential for fully automated processing.
Because of this, we often recommend all files be routed through Grooper before going to the long-term content repository. This facilitates capture of data from both types of files using a single process, and it produces high quality, normalized, full-text searchable PDF documents in the ECM system. An added benefit of this approach is that malicious links in electronic files can be eliminated. This is due to Grooper re-rendering the file as a text-behind PDF without hyperlinks. Hopefully this addresses your team's concerns. Let me know if any other questions come up. Always happy to assist. Thanks. -Matt
Matt Harrison
Product Manager
mharrison@bisok.com5