Skip to main content
 
 
 
IN THIS SECTION
  • What’s New with PatentsView — November 2024

    PatentsView has always been a leader in providing high-quality patents data to help drive insights into invention and innovation. The platform offers tools to help researchers better understand intellectual property (IP), inventors, and innovation. Users can also explore trends and connections between various topics to gain a deeper understanding of the IP landscape.

    Our team has been working diligently behind the scenes to not only uphold our reputation for high-quality data and disambiguation, but to make PatentsView better and more functional. Here are a few ways we are making PatentsView better for you.

    Service Desk

    We recently launched a new service desk to help users request an API key, get technical support, report a bug, or suggest improvements. The service desk also helps the PatentsView team better track requests and use your suggestions for continuous quality improvements.

    PatentSearch API

    The PatentSearch API’s full-text endpoints have been updated. For clarity and efficiency, granted and pre-grant text endpoints have been separated.

    • Granted text data can be retrieved at /api/v1/g_brf_sum_text/, etc. 
    • Pre-grant text data can be retrieved at /api/v1/pg_brf_sum_text/, etc.
    • The json keys for these endpoints' responses have been updated correspondingly to g_brf_sum_texts, etc.

    Full update notes can be found at https://search.patentsview.org/docs/2024/11/06/2.2-release.

    The new PatentSearch API is more advanced and efficient than the legacy API, which will be phased out in February 2025. Learn more about PatentSearch API in our PatentSearch API Reference page and Swagger interface.

    Ready to switch? Request a PatentSearch API key through our service desk.

    Ground Truth and Data Quality Checks

    To ensure the highest level of data accuracy, the PatentsView team has implemented several Ground Truth initiatives. These efforts involve cross-referencing patent data with verified sources to validate the information and correct any discrepancies. 

    By establishing a reliable ground truth, users can access more trustworthy data, which is crucial for conducting accurate analyses and making informed decisions. This commitment to data quality is what makes PatentsView the best in its class for patent data.
     

  • Support for Legacy API to End in February 2025. Switch to PatentSearch API Now.

    PatentsView is phasing out our legacy API, making way for the more advanced and efficient PatentSearch API. In September 2024, we began the process of retiring the old API, with full discontinuation set for February 2025. 

    We encourage all users to transition to the new PatentSearch API to ensure uninterrupted access to our services and to take advantage of the enhanced features and speed it offers.

    Ready to switch? Request a PatentSearch API key through our service desk.

    About PatentSearch API

    The new PatentSearch API is intended to inspire exploration and enhanced understanding of US intellectual property (IP) and innovation systems. The database for the PatentSearch API is updated regularly and features the best available tools for inventor disambiguation and data quality control.

    Researchers and developers can use the API to uncover information about people and companies, and to visualize trends and patterns in the US innovation landscape.

    The API offers seven unique endpoints that allow users to explore various questions, such as:

    • Which companies hold patents in 3D printing? Discover their locations and the technologies they were innovating in before and after receiving 3D printing patents.
    • What technology has been most commonly patented in the US in the last five years? Identify the top US and non-US cities producing these patents.
    • Which US inventors earned the most patents in the last 30 years? Track their patenting activity, including the number and types of patents and their co-inventors.

    Learn More About PatentSearch API

    For detailed documentation on the new PatentSearch API, visit our PatentSearch API Reference page. Additionally, you can explore the Swagger interface.

    Why Are We Switching APIs?

    PatentsView has been offering an API since 2015, which has been widely used and valued by thousands of users. 

    However, based on years of feedback and the evolving nature of patent data, PatentsView released the PatentSearch API in early 2024. This new API enhances the functionality and speed of the previous version.

    Consolidating Names

    Over time, the legacy API has been known by various names, including “PatentsView API,” “Swagger-based API,” and “MySQL API.” The new PatentSearch API has also been referred to as the “Elasticsearch API,” “Beta API,” and “Search API.” 

    Moving forward, we will consolidate the naming to its official title – PatentSearch API.

  • What’s New with PatentsView – March 2024

    The PatentsView team is always working to make our data more complete, more accurate, and more useful. Recently, we completed a validation process of our assignee disambiguation algorithm. We created large, updated, ground truth dataset to calculate a current view of the performance of our assignee disambiguation algorithms.

    This blog highlights our continued commitment to assignee disambiguation validation by detailing the process we went through to build a hand-labeled ground truth dataset, the python packages and metrics used in assignee validation, and by openly disclosing results and their statistical significance. 

    Why is evaluation important?

    Model evaluation is important because it allows us to be publicly transparent about the performance of our disambiguated data, identify situations where our model does not perform as well as it could, and to make changes to improve overall performance.  We validate our assignee disambiguation algorithm by estimating key performance metrics, like precision and recall, that summarize its accuracy. 

    Assignee Labeling Process

    Estimating the performance of our disambiguation algorithm requires benchmarking data: some “ground truth” against which we can compare our predictions and assess the quality of our disambiguation. There are two main types of ground truth data used to evaluate entity resolution algorithms. 

    The first type is partially resolved entities, which consists of examples of matching entity mentions (e.g., an example of assignees from two patents, like “Microsoft” and “Microsoft Corp.” that we know refer to the same organization), as well as examples of non-matching entities (e.g., “Microcorp” and “Microsoft”), which upon research of company location and services offered, we know are two separate companies. 

    The second type of ground truth data is fully resolved entities. In this case, we find all patents for which Microsoft is an assignee and use that as complete ground truth for evaluation. We demonstrate how we cluster entities, such as all instances of “Microsoft,” to create our ground truth in the remaining paragraphs of this section.

    Our evaluation process focuses on the second type of data, fully resolved entities, because this method provides more robust statistical outputs. We employed three data labelers for over 100 combined hours to resolve the entities of over 250 randomly selected assignee mentions. To maximize the accuracy of the ground truth labels we created, that is groupings of rawassignees that are mentions of the same organization, we broke the process down into two main parts: (a) finding everything that could be a related entity and then (b) removing unlike assignees based on greater rawassignee detail.

    In step (a) finding everything that could be a related entity, for each assignee, we compared the assignee reference (organization name or first/last name) with hundreds of similar names in our database. This was done by the hand-labelers using a custom-made Streamlit application, which we designed to be both a query and data augmentation tool. Labelers pulled similar assignees by testing various potential name representations – name shortenings, abbreviations, potential misspellings, etc. – in Streamlit and saving the results. Streamlit then augmented the saved results from the labeler search by adding previous clusters (prior disambiguation result) that were associated with one or more of the rawassignees found by the labeler and were not previously included. 

    For step (b) removing unlike assignees based on greater rawassignee detail, hand-labelers reviewed the saved cluster output from step (a). The saved cluster data contained additional information about the rawassignee, including the associated patent, patent type, CPC classification, and location. Using filters, sorting, or any resource necessary, labelers carefully inspected all types of assignee data which could prove useful to remove any rawassignee mentions that should not be included in the final cluster. Dependent on the size of clusters, manual review could take between two minutes and an hour.

    Evaluation Packages – Written for PV

    PatentsView estimates performance metrics in a transparent manner using open-source entity resolution evaluation software. The functions are called in this location from our PatentsView-DB repository. We leverage two repositories; ER-Evaluation for the pairwise precision and recall metric functions and PatentsView-Evaluation to upload the relevant benchmark datasets (Binette & Reiter, 2023,1). You can find more technical details about this process in the documentation and code linked in this section.

    Statistical Significance

    Precision and recall are standard metrics in evaluating entity resolution and a detailed discussion about those metrics can be found in our last evaluation report (Monath, et al, 2021, 15). Based on the newest PatentsView Ground Truth labeled data, the latest data update achieved a precision of .90 and a recall of .72. This indicates that we are more likely to leave a rawassignee record[1] out of a cluster (False Negative) than erroneously include an additional record into a cluster (False Positive). 

    Precision and recall are calculated on an entity level evaluating the results of the most recent assignee disambiguation algorithm for the 228 assignee clusters, where we have ground truth data. See our last evaluation report for a more detailed explanation on the difference of entity-level versus rawassignee record level evaluation (Monath, 2021, 16-17). A standard deviation of around 4% for both of our estimates can be interpreted as 4% variability around the estimate – meaning that there is an approximately 68% likelihood that the true (population) precision is between 0.8604 to 0.9400, and that recall is between 0.6839 to 0.7699. This team believes that 4% variability is a narrow enough range for confidence in these evaluation metrics. 

    MetricEstimateStandard Deviation
    Precision0.9000.040
    Recall0.7270.043
    F10.804 

     

    Conclusion

    In conclusion, the recent advancements in PatentsView, particularly concerning the validation of assignee disambiguation algorithms, signify a steadfast commitment to data accuracy and transparency. Through meticulous evaluation processes and the creation of comprehensive ground truth datasets, we ensure quality of our disambiguated data.

    By employing multiple data labelers and leveraging sophisticated evaluation packages, such as ER-Evaluation and PatentsView-Evaluation, metrics like precision and recall are estimated, shedding light on the algorithm's performance.

    The latest update boasts an impressive precision of 0.90 and a recall of 0.727, indicating a high level of accuracy in entity resolution. These efforts underscore PatentsView's unwavering dedication to providing users with high-quality disambiguated assignee data and our commitment to transparency to our users in our processes and our work.

    Citations

    Binette, O., & Reiter, J. P. (2023). ER-Evaluation: End-to-End Evaluation of Entity Resolution Systems. The Journal of Open-Source Softwarehttps://joss.theoj.org/papers/10.21105/joss.05619.pdf

    Monath, N., Jones, C., & Madhavan, S. (2021, July). PatentsView: Disambiguating Inventors, Assignees, and Locations. Retrieved from https://s3.amazonaws.com/data.patentsview.org/documents/PatentsView_Disambiguation_Methods_Documentation.pdf 

     

    [1] PatentsView defined a “rawassignee record” as every mention of an assignee found on all granted patents and patent applications

  • What Can PatentsView Do for You?

    PatentsView was launched in 2017 to help people access, understand, and use patent data in their research. The award-winning data visualizations shed light on trends and can show how inventors and innovation have changed since 1976, and the bulk data downloads, API tool, and community collaborative can let you dig deeper into patents data.

    People have recently used PatentsView data to explore the gender of inventors and how that balance has changed over time. They have looked at patent data to better understand innovation in Latin America, and they have mapped skill-relatedness networks to get a better idea about how the economy is evolving.

    Using PatentsView data

    Do you have a question about patents, innovation, inventors, or technology? PatentsView gives you several options to access and analyze its data.

    1. Patent Visualizations: A collection of charts and graphs to  explore patent data in an accessible format. The visualization tools allow the user to search for keywords, filter by location, make comparisons by attribute, and view a network of patents that shows a big picture.
    2. Community Collaborative: A moderated community that offers collaborative spaces, such as a discussion forum and the Data in Action Spotlight. The Data   in Action Spotlight is a blog featuring information about research done with PatentsView data and tools, updates to the PatentsView site, relevant events, and more.
    3. PatentSearch API: PatentSearch API allows software developers and researchers to work with our data within their local environment. Documentation is available on the PatentSearch API reference page for PatentSearch API as well as its query language and a list of specialized endpoints to search and filter the data. 
    4. Data Query Builder: A user-friendly query builder interface to allow users to create their own datasets based on specific search criteria. The query builder is distinct from the API Tool. The final dataset from any customized query is made available by providing a link sent to the user's email address. NOTE: At the time of posting, The Query Builder tool is currently down and unable to send requested data to your email address. Until it is back online, please refer to our service desk to inquire about customized data requests.
    5. Bulk Data Download: A page that provides the data used to create the website and its visualizations. It is split into various tables that contain raw, disambiguated, or processed data. The page also provides example Python and R scripts meant to assist with reading the data after they have downloaded.

    Updates and Improvements

    The PatentsView team is constantly adding new patent data, improving functionality, and fixing bugs and errors. For instance, PatentsView recently released data through September 30, 2023, and is working on a new data release in March.

    Earlier this year, the team implemented a new gender attribution algorithm, and updated the location standardization process. Visit the release notes page for a full list of updates and improvements over time.

    We Want to Hear from You

    Many of these improvements were spurred by user suggestions and questions. Your exploration of the data and reporting of discrepancies and errors helps support our team to return the highest quality data to the public.

    Please contact our service desk with any data questions and suggestions you may have, including Data in Action Spotlight post ideas. Let us know if you have recently published a paper or given a presentation based on PatentsView data or if you have an upcoming event that uses PatentsView in some way.

    To receive regular updates on what the PatentsView team is working on, subscribe to our bi-monthly newsletter.

  • The Case of the Missing Assignee Data: How the AIA Affected Pre-grant Assignee Information on Patent Applications

    The PatentsView team has received several inquiries about assignee data that appears to be missing starting in 2013. The number of assignees in pre-grant patent data seems to have suddenly fallen around that time, leading users to think that the data is missing or there is an error.

    We always appreciate our users bringing issues to our attention. It allows us to keep PatentsView data accurate and up to date. In this case, however, what you are seeing is not an error and there is no data missing. The sudden change is due to a shift in policy.

    The America Invents Act and Patent Data

    The 2011 America Invents Act (AIA) was designed to reduce waiting times in the patent application process. There were also provisions meant to reduce the number of lawsuits and other litigation that inventors and entrepreneurs faced and to bring the U.S. process and data in line with other countries.

    The bill also changed the rules for who could be listed as an applicant on patent applications. Prior to the act, applicants had to be a person. Most often, it was the inventor. If that person was working as part of a company or organization, then the organization would be listed as an assignee.

    Changes in Assignee Data

    The AIA allowed organizations to be listed as applicants and no longer required them to be listed as assignees. Beginning around 2013, they were doing just that. The number of assignees in patent applications dropped significantly because of this change, as shown in the graph below.

    A line graph that shows a drop in the number of organizations listed as applicants around 2013.

    However, the change in the application process did not affect granted patent data. Once the U.S. Patent and Trademark Office grants a patent, it generally lists the organization as an assignee. Therefore, the granted patent data does not reflect the same change in number of assignees as the pre-grant data.

    All this means that what looks like missing data — fewer than expected assignees in pre-grant data — is actually just a change in the way the data is collected and reported.

    Questions About PatentsView Data

    Our users are the greatest asset we have at PatentsView. If you have an issue with or question about PatentsView tools or data, you can browse our community forum or visit our service desk. We want to hear from you.

     

  • What's New with PatentsView - June 2023

    June Updates 

    This month in PatentsView news, the data team will release quarter four data for 2022 and the quarter one data for 2023. The disambiguated and processed data will include patents and published pre-grant patent applications from September 30, 2022, to March 30, 2023. In addition to bulk downloadable data for granted patents and pre-grant application publications, the legacy API, PatentsView's new PatentSearch API, and site visualizations will also be updated with data through March 30, 2023. To celebrate the completion of processing for the year 2022, we're lighting sparklers just in time for the independence and Emancipation Day celebrations in the United States!

    In our previous data updates, PatentsView gender data was attributed through a partnership with faculty at the University of Bordeaux. Starting from the final quarter of 2022 up to the present, our PatentsView data scientists have attributed gender to inventors using World Intellectual Property’s (WIPO’s) Genderit Method algorithm, which has been adjusted by our team. The new attribution method has been applied to all historic records and assigned to disambiguated inventors based on the majority gender of raw inventor records that combine to make the disambiguated inventor. For instance, if over 50% of raw records for a given inventor are marked female, then the inventor is attributed as female. In cases where exactly 50% of raw inventor records are marked as both female and male (which did occur), the gender remains unattributed.

    PatentsView has brought the inventor gender algorithm in house starting with the next data release. We aim to simplify processes and improve the timeliness of the data releases while maintaining data quality. Our new method outperforms the old method in terms of attribution rate based on a comparison of a sample week of quarter of data by 4%. In summary, the inclusion of gender attribution in the PatentsView internal data pipeline will ultimately result in faster and more accurate gender information for researchers, economists, students, inventors, and other users.

    Looking Ahead

    In pursuit of a faster and more efficient data processing pipeline that does not deter the current quality of PatentsView data, the data team also invested in weekly parsing of the raw XML data files from the United States Patent and Trademark Office (USPTO). Incremental conversion of the XML data into tsv format allows the data team to catch errors in the process before they lead to data quality issues or impede the disambiguation and attribution data processes further along the pipeline.

    Here's to diving into 2022 annual data and beginning our exploration with 2023!

  • What's New with PatentsView - March 2023

    March Updates

    This month, PatentsView released the third quarter of 2021 data complete with the new algorithm and data structure updates initiated last fall. The release notes web page holds detailed information on this release and historical releases.

    Also released this month are annualized gender data files with new documentation and an updated data dictionary from the Office of the Chief Economist (OCE) at the United States Patent and Trademark Office (USPTO). These datasets are designed for use in quick exploratory data analysis as well as read programmatically for more longitudinally focused data users. The annual files contain information from the assignee, inventor, location, application, and patent tables all in one place for a more comprehensive picture of patenting teams. In addition to pulling in variables from these separate PatentsView data tables, the datasets contain novel variables including the total number of inventors on a given patent, the total number of inventors listed on a given patent that were assigned a gender, the number of men inventors on each patent, the number of women inventors on each patent, and a flag for demonstrating whether inventor information is available for that patent.

    To read more about these data files and the inspiration for their generation, visit the Gender & Innovation page and navigate to the DATA section located under the interactive visualization of gender data from 2000 to 2020.

    Looking Ahead

    The next PatentsView data update is gearing up this March and will result in a double release of 2021 quarter four and 2022 quarter one data come early-summer. The team is working with OCE this spring to improve and optimize the assignee disambiguation and gender attribution algorithms. The anticipated result of this dive into algorithm repair and improvement is higher quality data. As always, please reach out to our team with data questions and suggestions. Your exploration of the data and reporting of discrepancies and errors helps support our team to return the highest quality data to the public.

    To receive regular updates on what the PatentsView team is working on in distributing patent data and reading about patenting literature, subscribe to our bi-monthly newsletter. Happy Spring!

     

  • What's New with PatentsView - December 2022

    What’s new with PatentsView: Our Algorithm is getting better!

    Over the last few months, PatentsView has been improving its disambiguation algorithms. These improvements give researchers, students, inventors, intellectual property enthusiasts, and anyone else with an interest in patent information more accurate data to work with.

    What has changed?

    Our algorithms have been updated to better represent patent trends by location and assignee. The updated algorithms increase accuracy in clustering — the grouping of raw information into similar organizations — and incorporate Open Street Mapping as an additional source. This results in better, more accurate data and analysis.

    These changes apply to all PatentsView data, including bulk downloads, legacy and PatentSearch APIs, query builder tool, and list searches.  

    What are disambiguation algorithms?

    PatentsView’s data visualizations and analysis rely on a series of algorithms and post-processing techniques to sort inventors and assignees by name and place. We need this process, known as disambiguation, because patent data is often incomplete or inconclusive.

    For instance, the U.S. Patent and Trademark Office does not collect data on an inventor’s gender. So, PatentsView uses an algorithm to make an educated guess about gender based on an inventor’s name and location.

    In other cases, one inventor may apply for multiple patents using different variations on their name, like John Smith, J. Smith, and Johnny P. Smith. Our algorithms help determine if these are all the same inventor or three different inventors.

    Why is this important?

    Innovations and inventions benefit all of society, and that benefit is increased when every inventor can fully participate in the process. Accurate analysis of patent data helps identify gaps, and thus provides a first step toward closing those gaps.

    PatentsView’s goal is to provide the most accurate, up-to-date, and complete analysis of intellectual property data to foster better knowledge of the IP system and drive new insights into invention and innovation. Updates like this put us one step closer to that goal.

    You can learn more about our methods and sources at https://patentsview.org.

  • Learn More About Inventor Demographic Attribution: Symposium Recordings Now Online

    At the end of August, PatentsView and the United States Patent and Trademark Office (USPTO) convened a group of 10 researchers, developers, and analysts to discuss the best and newest practices in identifying inventor demographics, including the use of gender and race. The event was structured as an all-day symposium with seven individual presentations and a three-person panel conversation on the social and economic implications of advancing our understanding of inventor demographics.

    Recordings from each session of the symposium are now available online.

    View the recordings now.

    Why Study Inventor Demographics?

    Quite simply, greater diversity and participation among inventors fuels more innovation and improves economic welfare. However, the current data suggest many people, particularly from diverse communities, are not represented among inventors. This means that challenges exist at various points along the pathway from inspiration, to inventor, to innovator and entrepreneur. Learning about who is and is not  participating as inventors and why is a critical step toward ensuring that all people can contribute equally to the innovation landscape and improve economic outcomes at both a personal and societal level.

    Unfortunately, the research into identifying and addressing such gaps is still in its infancy. One of the largest barriers is the lack of information about the demographics of inventors who apply for and ultimately receive patents.

    Disambiguation Methods and Practices

    Researchers have been using several methods and practices to shed light on inventor demographics when that information is not self-reported — the main processes are disambiguation and attribution.

    Each of these different processes has its own strengths and weaknesses. Speakers at the Advancing Research on Inventor Demographics Symposium, held on August 26, 2022, discussed the societal consequences of inequities in innovation. They also provided an in-depth discussion of the methods used to identify an inventor’s gender, race, ethnicity, and country of origin when that information is not reported in publicly available datasets.

    Leading Experts Discuss Cutting-Edge Research

    The symposium brought together economists, computer scientists, and others to discuss research methods, applied examples, and new ideas. The experts included:

    • Julio Raffo, a researcher in the Economics and Statistics Division at the World Intellectual Property Organization, who discussed gender attribution and the World Gender Name Dictionary 2.0.
    • Ernest Miguelez, a research fellow at the French National Centre for Scientific Research (CNRS), who discussed the PatentsView approach to inventor gender.
    • Michelle Saksena, a senior research economist at USPTO, who discussed assessing approaches for identifying the gender of inventors on patents.
    • Fangzhou Xie, a Ph.D. student at the Department of Economics at Rutgers University, who discussed using an R package he developed, called rethnicity, to predict ethnicity from names.
    • Francesco Lissoni, a professor of economics at the Bordeaux School of Economics within the University of Bordeaux, who discussed ways to determine inventors’ foreign-origin status using name analysis.
    • Jay Budzick, the CTO of Zest AI, who discussed using the Zest Race Predictor to uncover hidden disparities.
    • Marc Elliot, Distinguished Chair in Statistics and senior principal researcher at the RAND Corporation, who discussed methods to estimate race and ethnicity, and associated disparities, when records do not include self-reported data.
    • Trevon Logan, a Distinguished Professor of Economics at Ohio State University; Adam Gailey, a principal in the Financial Economics Practice of Charles River Associates; and Jason Dietrich, section chief for compliance and analytics policy at the Consumer Financial Protection Bureau, who participated in a panel discussion about gaps in innovation by race/ethnicity and gender.
  • What's New with PatentsView - September 2022

    AI & Innovation and Resource Pages Now Available

    As we start a new academic year, PatentsView is working to help researchers better understand the relationships across various patents and innovative technologies. To that end, we’ve launched two new pages: a topic page on Artificial Intelligence & Innovation, and a new Resources page.

    The Artificial Intelligence & Innovation Patent Dataset

    While artificial intelligence (AI) has advanced by leaps and bounds, researchers are still working to understand the many ways AI inventions and innovations have impacted technology and society. To help researchers delve into how this emerging technology is affecting our lives, the United States Patent and Trademark Office (USPTO) released the AI Patent Dataset (AIPD).

    The dataset includes an analysis of 13.2 million patent documents published through 2020, identifying which patents contain AI. The AIPD integrates seamlessly into PatentsView, allowing researchers to explore relationships between patents related to AI and the companies and inventors who hold them.

    What’s on the new AI & Innovation page?

    The new page contains an interactive data visualization that allows users to explore how patents are related to government interest, a deep dive into the machine learning model used to create the AIPD, and the latest AI-related news and reports.

    Visit the new AI & Innovation Page now to find out more.

    What’s on the new Resources page?

    The new PatentsView Resources page provides patent researchers, inventors, and intellectual property afficionados an easy way to find code snippets and packages to better use the PatentsView API, sources to help researchers use BigQuery to explore historical patent and PatentsView data, Zenodo links between patents and scientific articles, and more.

    You can also find information about the I3 Collaborative and IPRoduct repositories. I3 and IPRoduct are working groups for users to contribute to data frames and named data projects as well as export collaboratively made datasets. IPRoduct focuses on connecting patents to products in support of intellectual property rights, and I3 is a project to connect citations and patents among patenting groups worldwide.

    Get connected on the new Resources page.

Button sidebar