Skip to main content
 
 
 
IN THIS SECTION
13 posts
corp_patent
Last seen: 04/30/2018 - 19:02
Joined: 04/30/2018 - 11:00
Missing Patents in the Data Download Tables

Dear PatentsView,

I find that the number of observations in the dataset containing all granted applications ("application") is 6,502,933 . However,  number of observations in the data on granted patents ("patent") is  5,395,577 although the webpage states that there are 6,502,933 observations in the dataset. When I merge the two datasets, I find that many patents granted in year 2017 were not included in the patent file. I guess the PatentsView researchers did not update the patent file accordingly. Could you please provide the more comprehensive patent database in the bulk download section?

Thanks!

Jerry Marschke
Last seen: 05/09/2018 - 20:32
Joined: 05/09/2018 - 17:44
Missing patent-inventor pairs from the bulk data

For a research project, I thought I would replace the inventor ids from a different disambiguation with the ids from PatentsView's disambiguation.  I am working with roughly 9 million patent-inventor pairs for U.S. utility patents granted between 1976 and 2010.  The issue is that I could only find in the PatentsView's patent inventor bulk file about 7.5 million of these.  There seems to be about 1.5 million or so missing.  I spot checked the bulk file but didn't see any pattern to explain the missing patent-inventors (two patent number examples: 4206298, 4209112  --I can supply more if that would be helpful).  Sometimes it's the entire set of inventors missing for a patent, in other cases a patent's inventors are partially present.  Is there a reason for these missing patent-inventor combinations?  Is that an issue that will be fixed? 

Many thanks. 

Jerry Marschke

PVTeam
Role: moderator
Last seen: 04/24/2024 - 12:31
Joined: 10/17/2017 - 10:47
RE: MISSING PATENT-INVENTOR PAIRS FROM THE BULK DATA

Hello Jerry Marschke,

Thank you for bringing this to our attention. We have identified the problem and are currently working to fix it. Could you please email us with additional examples of patents where this issue is present?

-PVTeam

 

PVTeam
Role: moderator
Last seen: 04/24/2024 - 12:31
Joined: 10/17/2017 - 10:47
RE: MISSING PATENTS IN THE DATA DOWNLOAD TABLES

Hi corp_patent,

As discussed via email, to avoid exceeding the 32k threshold that SAS uses for limiting record size, you can try the following two approaches:

If you need the abstract, set LRECL to a higher value to allow import of all the records (this may cause your import to be slower).

- This can be done in the infile statement where you have DSDlrecl=32767 simply increase that number until the data reads in as desired

- Examples and documentation on this can be found here: http://support.sas.com/techsup/technote/ts673.pdf

Probably a better solution, if you do not need the abstract, would be to just import the data with a lower number of characters (i.e., So that it is not filling up the buffer).

informat abstract $5. ;

-PVTeam

akaru
Last seen: 04/04/2019 - 18:25
Joined: 04/04/2019 - 12:58
Hello PVTeam,…

Hello PVTeam,

I have a similar problem. When reading the patent.tsv file, though it is said to have 6,819,362 rows, I was only able to read 5,452,948 rows. It seems there is something missing in the row of patent # 9352297. R reports a missing EOL. A similar issue occured when I was loading the claim.tsv file, was not able to load all the rows. I am using R's read_delim to load the tables.

Thanks 

PVTeam
Role: moderator
Last seen: 04/24/2024 - 12:31
Joined: 10/17/2017 - 10:47
RE: Response to Import Issue with read_delim()

Hi, 

We looked into this issue. The best way to read the patent table (and large tables like the claims table) into R is using fread() from the data.table package:

  • fread("patent.tsv", sep='\t', quote = "")

When we tried to read this table in with read_delim(), we didn’t get the same EOL issue you came across, but realized that some longer text fields (such as abstract in the patent table) can have entries with additional line characters that can conflict with EOL characters. We will look into this further.

Thanks,

PVTeam

pratiksha
Last seen: 08/08/2018 - 10:16
Joined: 08/08/2018 - 07:08
missing patents from 2014

Hello

I have observed that number of patents in application.tsv from latest dump decrease after 2014 , on the contrary from USPTO statistics number of patents increased every year. 

year counts

2001  228021    

2002  228947     

2003  220536    

2004  221333    

2005  226242    

2006  231218    

2007  239547     

2008  240781     

2009  231648     

2010  245521  

2011  263246     

2012  282219     

2013  284992  

2014  258012    

2015  205911    

2016  119813    

2017   25614     

2018      62      

 

While actual stats from USPTO for granted patents are much different. What is the cause of missing data and how can that be accessed?

 

Best regards

Pratiksha

PVTeam
Role: moderator
Last seen: 04/24/2024 - 12:31
Joined: 10/17/2017 - 10:47
Re: MISSING PATENTS FROM 2014

Hi Pratiksha,

When you ran your query to get the number of patents in application.tsv grouped by year, you used application date instead of patent grant date. Application dates always lag because a patent is not granted right away. For example, patent #9226438 has an application date of 9/30/2014, but was granted 01/05/2016.
 
If you instead join the application table with the patent table and apply the same grouping (this time with the patent grant date), you will see that the number of patents does not decrease after 2014.
 
Thank you,
 
PVTeam

dn141082
Last seen: 06/24/2021 - 14:42
Joined: 05/09/2020 - 11:08
Missing patents in patents.tsv

Hello PVTeam, 

I have a requirement for which I have to join the application.tsv, patent.tsv and inventor.tsv and then match to data from another source. After proceeding with the joins and matching, I found that some granted patents are missing from the patents.tsv file but are present in the applications file (with "id" as well as "patent_id" entries).

Some examples: (patent_id) 9352318, 9357959, 9763369 

Unfortunately, I have about 75,000 of such cases. Individual patent search on PV, USPTO as well as Google Patents shows that these are granted and active patents, which led me to conclude that these are missing from the bulk download file patents.tsv. I am using Stata to work on these files. Can you please let me know what is wrong or if I misunderstood this phenomenon.

dn141082
Last seen: 06/24/2021 - 14:42
Joined: 05/09/2020 - 11:08
Missing patents in patent.tsv

Hi Russ, 

Thank you for quickly investigating the matter and suggesting a solution. 

I am afraid that the problem I am facing still exists in the patent.tsv file that I downloaded. Without any joins, just in the bare patent.tsv file, I am unable to find those patents. I have tried by using Stata as well as Notepad++ to check for those patents. I do not know how PV database is structured, but maybe the query you have in your response and the individual patent search query hits a different table than the patent.tsv file made available for download.  Do you think this is possible? Maybe I need to use the API too for multiple patents. 

Russ
Last seen: 03/21/2024 - 09:05
Joined: 11/14/2017 - 22:15
check zip file?

Are you sure the zip file downloaded properly?  Does the line count match what's on the downloads page?  I used a perl script to print the lines for the patents you mention and their line numbers.  I've changed the tabs to or bars for display here and excluded the lengthy abstracts .

id|type|number|country|date|title|kind|num_claims|filename|withdrawn
9352318|utility|9352318|US|2016-05-31|Flip top cap with contamination protection|B2|14|ipg160531.xml|0
9357959|utility|9357959|US|2016-06-07|Method and system for dynamically updating calibration parameters for an analyte sensor|B2|18|ipg160607.xml|0
9763369|utility|9763369|US|2017-09-12|Shielded electrical cable|B2|8|ipg170912.xml|0

9352318 found on line 5916045
9357959 found on line 5921649
9763369 found on line 6324306

The only thing that looks a little odd are the zeros in the withdrawn column.  Patents at the beginning of the file have the word NULL.

Russ

 

dn141082
Last seen: 06/24/2021 - 14:42
Joined: 05/09/2020 - 11:08
You were right!

Russ, you were right the first time around. I guess there is some problem with Stata opening the patent.tsv file. It is dropping some records on its own accord. I will be using another program to create joins of those three files. 

Thank you so much for taking out time for this issue.