Skip to main content
 
 
 
IN THIS SECTION
5 posts
arash.ahk
Last seen: 09/22/2021 - 05:22
Joined: 09/15/2021 - 06:52
Citation data missing patent information

Hi all,

"uspatentcitation" file has over 115 million records where captures citation between patents -> {'patnet-id': '5354551', 'citation-id': '4875247'}.

Assuming that both patent-id and citation-id field contains patent, the expectation is to being able to fetch metadata from the "patent" file. Patent file has over 7,5 million unique values -> {'patent-id': 10000008, 'time': Timestamp('2018-06-19')}.

A quick look up shows that over 100 million ids (either patent-id or citation-id) from "uspatentcitation" file has no information in "patent" file. Is this normal or am I missing a file where it can give the metadata for patent-id and citation-id (i.e. date of application, title, cpc,...)?

I hope the observation and question was clear, but happy to give more clarification.

Thanks in advance for your attention!

Arash

PVTeam
Role: moderator
Last seen: 10/06/2021 - 11:29
Joined: 10/17/2017 - 10:47
Hello Arash,  We looked…

Hello Arash, 

We looked into these tables and found that in the uspatentcitation table there are 13,995,409 patent_id citation_id entries where there is not a citation_id in the patent table. 

All of these are either 1) malformed citation IDs from the original XML; or 2) patents prior to 1976. 

Please let us know if you have additional questions!

Best,

PVTeam

 

arash.ahk
Last seen: 09/22/2021 - 05:22
Joined: 09/15/2021 - 06:52
Hi, Thanks for your…

Hi,

Thanks for your followup and investigation.

I made an observation in "uspatentcitation" file. There are 541,259 unique patent ids. Checking the "patent-id"s with patent "number" column in "patent" files revels 523,120 records are missing. In other words, there seems to be only 3.3% of patents in "uspatentcitation" file with full metadata.

Should I take your earlier answer as a possible reason for this case, or could you please check on this and let me know why is this a case or am I missing something?

Thanks in advance for your attention!

PVTeam
Role: moderator
Last seen: 10/06/2021 - 11:29
Joined: 10/17/2017 - 10:47
Hello Arash, would you be…

Hello Arash, would you be able to email us with a few example patent ids from your missing list (contact@patentsview.org)? The citation id issues from the XML files in addition to missing citations on patents prior to 1976 may be the cause of the missing ids you are seeing, but having specific examples will help us to model what you are seeing in the file. 

Thank you,

PVTeam

Russ
Last seen: 10/06/2021 - 11:06
Joined: 11/14/2017 - 22:15
Not a perfect world

There are 305 granted patents that aren't in the bulk xml for some reason.  See https://patentsview.org/forum/8/topic/127 though the patents endpoint query no longer works (corrected one is here).  There are citations to these patents that you wouldn't be able to resolve.  Ex 7,606,803 is a missing patent that is cited by 4 other patents,  8250521, 9256516, 9286584, 9355375.  

We'd need the uspto to issue an xml file with data for the ones that are missing in order for these citations to be found.

Russ