Methods & Sources
The PatentsView API provides access to a continually updated data source that uses probabilistic methods to determine whether or not inventors with the same name are indeed the same person and generates disambiguated inventor identifiers. Automatic disambiguation of inventor names is critical for exploring patterns and trends in US and international patenting activity. Both the visualization tools and query tools draw from this public API.
The PatentsView data warehouse is sourced from USPTO-provided text and XML data on published patent applications (2001-present) and granted patents (1976-present). These data are publicly available at patents.reedtech.com. Data are parsed and structured into a relational database. From this database, assignee and lawyer disambiguation is performed and the patents are geocoded with a location-based disambiguation.
The PatentsView data generation process does not fully disambiguate the names of assignees. The University of Michigan’s STATA Utilities(1) are initially applied to raw assignee names to correct minor typos and misspellings. The Jaro-Winkler(2) string similarity algorithm is then applied to each pair of processed assignee names to disambiguate records. In other words, processed assignee names that are within a certain bound of similarity are considered the same and are linked together.
When an inventor applies for a patent, the USPTO does not require that he or she record a unique identifier. As a result, searching for all the patents associated with a specific inventor can be difficult. This is particularly true if the inventor’s name is common or has multiple forms. The USPTO hosted an Inventor Disambiguation Workshop on September 24, 2015. The research team from the University of Massachusetts Amherst led by Andrew McCallum and Nicholas Monath authored the successful algorithm that was integrated in the PatentsView data platform in March 2016. The algorithm uses discriminative hierarchical coreference as a new approach to increase the quality of PatentsView data.
Because the disambiguation of inventor identity is an ongoing effort, there are likely to be errors in this algorithm that appear in the PatentsView query results. The team welcomes feedback as we continue to improve on our disambiguation methodology.
Patent Classes and Technologies
Patents are classified by four distinct schemes in the PatentsView database—cooperative patent classification (CPC), World Intellectual Property (WIPO) technology fields, US patent classification (USPC), and the National Bureau of Economic Research (NBER) technology area subcategories The USPC classification scheme was retired in June 2015 and applications filed after this date will not have a USPC. Patent applications filed after June 2015 also will not have NBER technology categories, as the NBER category is based on a concordance with USPC. All patent classes visualized in the tool represent the current patent class, unless otherwise noted.
(2) William E Winkler. Overview of Record Linkage and Current Research Directions . Tech. rep., Statistical Research Division, U.S. Census Bureau, 2006.