Skip to main content
2 posts
Last seen: 09/05/2022 - 15:38
Joined: 06/17/2022 - 09:59
Weird merge pattern across dataset

Dear all,

I saw this patent_inventor dataset given by PatentView, unfortunately no official document of how they generate this is found. I examine it by trying to merge back with disambiguous version datasets patent, inventor, location respectively. I assume since the later are where the information comes from, when merge with the latters as using dataset, the unmatched observation should all comes from the latter datasets. But when I merge patent_inventor(master) with patent(using), both dataset contain unique observation unmatched. This also happens when I merge patent_inventor(master) with location(using). I wonder why this would happen? Is there any mistake?


Role: moderator
Last seen: 11/22/2022 - 14:14
Joined: 10/17/2017 - 10:47
The patent_inventor and…

The patent_inventor and patent_assignee tables are crosswalk tables that connect the patent.tsv file to the inventor.tsv, assignee.tsv, and location.tsv files. These are files that are the outcome produced through our disambiguation method which can be found here -

PatentsView uses a series of algorithms and post-processing techniques to track the patenting activity of inventors and assignees over time. The disambiguation process is a value-added service applied to the U.S. Patent and Trademark Office’s (USPTO’s) raw, publicly available data on granted patents from 1976 to the present, the data are updated quarterly. The process provides unique identifiers for patent inventors, assignees, and locations.

  1. The patent.tsv file is the master file. All records in the patent_* files should have a corresponding patent in the patent.tsv file. If they do not, be sure to check your ingest functions. If you continue to have trouble merging those files, please send your code to and we will help you troubleshoot your merge.
  2. Note that not all patents have an inventor or assignee records, so you will have unmatched observations.
  3. The location.tsv table includes locations from multiple entities in both the granted patents and the pregrant publications. Unless you merge to all of these entity files, you will have unmatched observations.

Lastly, if you truely think that there is a mistake in the files, please send your code to and we will verify and replace the data if it is necessary.