OCR Trademark Searches

Russ

OCR Trademark Searches

Search 1,011,906 ocr'ed registration certificates. From registration number 1 issued 1870-10-25 to 3,752,366 issued 2010-02-23.
It's predominantly registration certificates I don't think are in TESS and additional ones described below.

Important Update: My service provider wanted to double my fees for having nearly three gigabytes of ocr'ed data. To avoid that I removed most of the data. What remains here are the 17,192 trademarks that aren't otherwise online, they aren't in tess or tsdr. The original data set is here (another site of mine).

Inclusion words - must be present
Exclusion words - must not be present
Favorable words - rows containing these words will appear before rows without these words when sorted by relevance.

Sort by registration number relevance
Limit to not in Tess

Or return just registration number




The input boxes above can contain quoted strings ex 1. 'Stanley Works' ex 2. "plumb bobs" The results would be trademarks where the quoted string matched exactly. Enter both sets of quotes strings to search for the "plumb bobs" belonging to 'Stanley Works'.
Putting the 'Stanley Works' in the exclusion box would show other company's "plumb bobs"
Can be problematic due to ocr errors. See the notes below on minimum lengths and stop words etc.
The searches are case insensitive (ex: a search for Marshmallow will give the same results as searches for marshmallow or marshMallow).
Tip: the favorable words box could contain special punctuation characters as explained here. Also note that words of three or fewer characters are not indexed. There is also this list of stopwords that don't get indexed. Searching for words of three characters or less or for stopwords will not match anything in the database. Additionally, words appearing in 50% or more of the ocr'ed registration certificates are not indexed. Words like Registered or phrases like 'Patent Office' would not be searchable. These are limitations of MySQL, the underlying free database used here.

I added ten thousand registration certificates that aren't on uspto.gov at all (not in tess/tsdr).
I've also ocr'ed the nearly 4,000 tiff that are empty files (file size of zero bytes) on the usamark dvds. All but two were in tsdr.
I've ocr'ed the ~600,000 registrations I do not believe are in TESS and I've continued to ocr more of the registrations that are most likely in TESS.