OCRing USAMark Tiffs

OCR = Optical Character Recognition, software that attempts to figure out what text is contained in an image, generally reading an image and producing a text file. On the patent office’s web site there are a couple of date based differences in what data is available. For patents, there isn’t a lot of data associated … Continue reading “OCRing USAMark Tiffs”

OCRed Plant Patents

Bulk data became a thing a few years ago, so I downloaded the USMark trademark zip files, all 43,000 of them. I then ocr’ed a little over a million of the registration certificates in the zip files and put them into a searchable database. Then I did the same thing for plant patents, which began … Continue reading “OCRed Plant Patents”