Dear @linux and @academicchatter folks:

Please suggest libre/open source tools that allow for the extraction of text and images from scientific pdf documents?

P.S: I’m on a linux machine. Would like something terminal friendly, if possible!

  • ShihaliEnglish
    arrow-up
    1
    arrow-down
    0
    ·
    6 months ago
    edit-2
    6 months ago
    link
    fedilink

    gImageReader is a graphical front-end to the open-source OCR program Tesseract, so that might be just what you’re looking for. The default settings don’t add the OCR’d text to the PDF but you can do that.