Creating a Searchable PDF File with OCRed Text

Patent Librarian and all-around good guy Jim Miller lamented the lack of OCR text in an Espacenet patent that was in Spanish. I’ve OCRed to produce text files that became a searchable online database but I hadn’t tried producing searchable pdfs. I offered to try to help anyway. In googling around, I found the June 4th comment at the bottom of this page that says that tesseract, an opensouce OCR program, can produce pdf files with embedded text. Continue reading “Creating a Searchable PDF File with OCRed Text”