PDF Editor for repairing book scan OCR?

zabadoh · 10 months ago

PDF Editor for repairing book scan OCR?

ChickenBoo · 10 months ago

If you want to host it locally, Stirling PDF can be run in docker, and uses a library that uses Tesseract. Has a bunch of other handy PDF operations, too. I keep it around for the two times a year I need to merge, split, or decrypt PDFs.

https://github.com/Frooodle/Stirling-PDF/blob/main/HowToUseOCR.md

It can do it straight from PDF and do multiple files at a time.

sibloure · 10 months ago

This is amazing. Did not realize it existed. Thank you for sharing