Skip to main content
 
 
 
IN THIS SECTION
7 posts
Scott
Last seen: 03/17/2021 - 15:43
Joined: 03/10/2021 - 11:36
Patent citations

Hi.  I'm brand new to this dataset.

I'm trying to sketch out in my mind how one would go about generating the data in the PatentsView files from the weekly/monthly files published by USPTO, and I've run into something that's puzzling me.

When I do ...

cat uspatentcitation.tsv | head

... on the current (12/05/2020) version of the file, I get a citation to a 1934 patent for patent id D809697.  The ID of the 1934 patent is 1963218.  The citation even has data on the assignee of the 1934 patent.  None of this information is in the relevant patent grant record  in ipg180206.xml.  Since the patent grant data ( https://developer.uspto.gov/product/patent-grant-full-text-dataxml ) only goes back to 1976, it seems like I would not be able to fully construct this record from the electronic patent grant records.  Does anyone know how this record would have been constructed in the uspatentcitation file?

More generally, if anyone has any good suggestions as to how I can quickly get up to speed on the topic of reconstructing the PatentsView files from raw data, I'd be most grateful.

Thanks for any guidance.

Russ
Last seen: 03/21/2024 - 09:05
Joined: 11/14/2017 - 22:15
OPENSOURCE PROJECTS

Hi Scott,

The api is opensource as is the database/bulk file builder.  The bulk data is in two or three distinct formats.  This page explains how to run the parsers for them: https://github.com/PatentsView/PatentsView-DB/wiki/Instructions-for-Running-1976-2001-and-2002-2004-Parsers

You are right that pre-1976 patents themselves are not in the bulk xml files or patentsview database but the reference data is in the xml.  In ipg180206.xml (issue date of the referencing patent) D0809697's xml has the reference data that winds up in uspatentcitation.tsv.

<us-references-cited>
<us-citation>
<patcit num="00001">
<document-id>
<country>US</country>
<doc-number>1963218</doc-number>
<kind>A</kind>
<name>Wakefield</name>  (patent holder, not the assignee FO Wakefield Co)
<date>19340600</date>
</document-id>
</patcit>
<category>cited by examiner</category>

The reference data is not exactly clean but it is there.  I've run across all kinds of errors, typos in the inventor's  name,  and even the patent number being referenced.  Also note that the issue date of the referenced patent is not complete, it's missing the date portion.  The actual issue date of 1963218 is June 19, 1934 while  uspatentcitation.tsv has 1934-06-01.  Somewhere in the patentsview loading process the dates must get set to 01.

Russ

Scott
Last seen: 03/17/2021 - 15:43
Joined: 03/10/2021 - 11:36
THANK YOU VERY MUCH, RUSS

Russ,

Thank you so much for your super helpful answer.  The parser I was using didn't show the extra fields in the citation.  I had just checked the raw file ( zcat ipg180206.zip | fgrep -A 50 -B 50 1963218 ) and found the complete record, and was coming back to the message board to correct myself when I found your answer. 

The pointers to those repos help a bunch.  Thanks also for pointing out the  screwyness with the date.  I'm curious to know how it is that you know that the actual issue day is the 19th.

Thank you very much for coming to the aid of a newbie!

Scott

Russ
Last seen: 03/21/2024 - 09:05
Joined: 11/14/2017 - 22:15
USPTO SITE

Scott,

The uspto's web site will show you the data  for 1976 and up patents if you know the patent number.  For pre-1976 patents it shows the issue date, classifications and an image of the patent  http://patft.uspto.gov/netahtml/PTO/srchnum.htm  The image of 1963218  shows the assignment to F. W. Wakefield Brass Co.

Russ

Scott
Last seen: 03/17/2021 - 15:43
Joined: 03/10/2021 - 11:36
PATENTSVIEW-DB API

Thanks for pointing me to the image archive.  Clearly, I have a lot to learn.

I'm struggling to get a foothold in the PatentsView-DB API.  If I understand correctly, I can point this code (somehow) at current bulk data and build a MySQL database that looks like the files on the PV download page.  The wiki entries discuss running the parsers for 1976-2001 data and for 2002-2004 data, but those instructions reference a parser_wrapper.py file that I don't even see in the distribution.  For current data, I guess it's supposed to be obvious how to run the parser, but I'm just not seeing it yet.  Is there an example of this anywhere?  Am I correct in assuming that I would (somehow) point code in the updater package at ipgYYMMDD.zip files?

Thanks.  I'm sorry to keep peppering you with questions.

Russ
Last seen: 03/21/2024 - 09:05
Joined: 11/14/2017 - 22:15
REPO WEIRDNESS

Scott,

I'm not sure what's going on in their repo, you could try raising an issue.  In my fork of it the wrapper is Scripts/Raw_Data_Parsers/parser_wrapper.py  It's been a while since I've tried it myself.  You might be on your own to retrieve the xml files or you might be able to use the uspto's PatentPublicData  I only ran it on a small set of xml files I downloaded from a browser- not the whole dataset.  You do need to put the different date ranges into separate folders as the wiki mentions.  Then you just run the wrapper for each period.  From the source: parser.add_argument('--period',default="5",choices=['1','2', '3'],help='Enter 1 for 1976-2001 or 2 for 2002-2004 or 3 for 2005.')  

Russ

Scott
Last seen: 03/17/2021 - 15:43
Joined: 03/10/2021 - 11:36
THANK YOU VERY MUCH, RUSS

(Thank you very much, Russ)