Why does PatentsView rely on inventor disambiguation?
When an inventor applies for a patent, the USPTO does not require that they record a unique identifier. As a result, searching for all the patents associated with a specific inventor can be difficult. This is particularly true if the inventor's name is common or has multiple forms. The USPTO hosted an Inventor Disambiguation Workshop on September 24, 2015. The research team from the University of Massachusetts Amherst led by Andrew McCallum and Nicholas Monath authored the successful algorithm that was integrated into the PatentsView data platform in March 2016. The algorithm used discriminative hierarchical coreference to disambiguate inventor identities. In 2020, the research team implemented an improved version of the algorithm called Scalable Hierarchical Clustering with Tree Grafting. The original novel approach and its subsequent improvements greatly increase the quality and utility of PatentsView data.
What fields are being disambiguated in the PatentsView database?
The PatentsView database contains four disambiguated fields: inventors, assignees, locations, and lawyers. All these fields are available through the API and bulk data download. Disambiguated inventor, assignee, and location data are also available through the Data Query Builder and visualization interface.
My project requires the use of persistent inventor identifiers. How can I find them?
2021 Update: With the data update release in Mar 2021 (data through Dec 29, 2020), the inventor ID values are persistent. Thanks to the incremental process supported by the updated disambiguation algorithm, new inventor records are disambiguated and assigned the same ID (for inventors who already existed in the PatentsView databases ) or assigned a new ID (new inventors) accordingly.
Previous Answer: The PatentsView inventor disambiguation algorithm generates new inventor identities and ID values with every database update. The algorithm takes into account the full scope of the database to decide whether two or more inventors with similar names are in fact the same individual based on the topic of their patent, assignee relations, and other factors. Since more recent data can provide additional information about some inventors, the disambiguated clusters can change and identities can shift between database updates. To mitigate this difficulty, the PatentsView team provides a persistent_inventor_disamb table found on the bulk downloads page. This table links the disambiguated identities of each database update with the persistent raw inventor identifiers (that do not change over time). Thus, users can identify and use the inventor IDs across database updates for their research and analysis.
I am an inventor and noticed errors in patents assigned to me in PatentsView. What should I do?
The PatentsView team highly appreciates your feedback. In fact, we already heard back from a few inventors that identified an error regarding their patents in PatentsView. However, the nature of the inventor disambiguation algorithm does not allow us to introduce manual fixes consistently throughout database updates. The PatentsView team will collect the errors and implement a post-processing step to make sure that known errors do not occur in future database updates.
Where can I find more information about the PatentsView disambiguation algorithms?
All disambiguation algorithms are available at https://github.com/PatentsView/PatentsView-Disambiguation. Information about the 2015 inventor disambiguation workshop can be found here. Further information about the 2021 Workshop on Entity Resolution can be found here.