Skip to main content
 
 
 
IN THIS SECTION
4 posts
jfbrou@stanford.edu
Last seen: 11/07/2021 - 23:52
Joined: 05/14/2021 - 09:47
Google Patents

Hi,

I am wondering if the PatentsView team is planning to incorporate pre-1976 data from Google Patents at some point? If not, what are the main challenges of doing so in terms of the disambiguation and gender attribution procedures?

Thank you!

Jean-Felix Brouillette

Russ
Last seen: 11/15/2021 - 23:34
Joined: 11/14/2017 - 22:15
is their data available


Does google patents share their data?  The patentsview database is built from the ~2000 bulk grant xml files the uspto makes available on https://bulkdata.uspto.gov/   If google patents supplied their data as bulk xml files, I'd imagine a parser could be written to add it to patentsview. See https://github.com/PatentsView/PatentsView-DB and
https://patentsview.org/government-interest/extraction-process which has a diagram showing the xml files being processed for the government interest fields

The uspto has inventors names for patents going back to 1920 but I don't believe they export this data.  You can, for example, do an inventor name search in/edison at https://patft.uspto.gov/netahtml/PTO/search-adv.htm in the 1790 to present (entire database). Patents where an inventor's name is Edison, both the famous guy back to 1920 and more modern ones with that surname will be returned.  If you note, the titles are present for ones from 1976 and up but not for the earlier ones (though the uspto knew an inventor's name was Edison).  If you could get a hold of the inventor's names from 1920-1975 it may be possible to apply some of the gender attribution procedures.  (https://patentsview.org/gender-attribution says that the country of origin is used in step 4 but this data might not be available before 1976.)


I hope this helps,
Russ Allen
 

jfbrou@stanford.edu
Last seen: 11/07/2021 - 23:52
Joined: 05/14/2021 - 09:47
Hi, My guess is that Google…

Hi,

My guess is that Google does share their data. For example, these two papers used the Google Patents data in their analysis:

1. https://www.aeaweb.org/articles?id=10.1257/aeri.20190499

2. https://academic.oup.com/qje/article/132/2/665/3076284

The second paper explicitly mentions that Google was actively participating in sharing the data (going back to 1926) while the first one scraped the data from the web.

But yes this helps thank you!

Jean-Felix

PVTeam
Role: moderator
Last seen: 11/15/2021 - 13:52
Joined: 10/17/2017 - 10:47
pre-1976 data

At this time, incorporating pre-1976 record of patents into the PatentsView data frame is not on the team's radar as a priority. We can bring it to the attention of our funders and stakeholders and see if this is something that would be feasible. 

Thank you,

-PVTeam