Skip to main content
ex. data visualization, research paper
  • American Institutes for Research examines innovation in renewable energy patent study

    Social science research starts with a commitment to using our time and resources in addressing problems most affecting society and the human experience. One urgent and globally important area of social science research is renewable energy, specifically, understanding the rate of innovation and adoption in the sector.

    At the American Institutes for Research (AIR), we have a team of data scientists that work on transforming, disambiguating, normalizing, and quality assuring all data on the patents granted in the United States. This unleashes opportunity for us to use the data in research and analysis. We chose to develop classification models that predict which patents are related to renewable energy and present our findings at the Conference for Women in Data Science and Statistics this year on 10/8/2022 in St. Louis, MI. We used the Cooperative Patent Classification (CPC) labeling system to find renewable energy patents and look at the most common words used in patent abstracts and titles for this type of patent, demonstrated in the word cloud below.

    Figure 1. Word Cloud of most popular stems of words  found in patent titles and abstracts.
    Figure 1. Word Cloud of most popular stems of words  found in patent titles and abstracts.


    We built random forest, logistic regression, and naïve bayes machine-learning classification models on the granted Renewable Energy (RE) patents (CPC subclasses under Y02) to predict whether a given patent was RE-related or not. Our efforts focused on searching for the model construction method and parameter choices that optimized the F1-score for Class 1 (predicted as RE-related). Our best-performing model was a random forest classifier and a CountVectorizer (a program to break down sentences into countable parts) on patent abstracts to achieve an F1-score of almost .85 as shown in figure 2.

    Figure 2. Results of random forest classifier and CountVectorizer methods on RE identification in patent abstracts
    Figure 2. Results of random forest classifier and CountVectorizer methods on RE identification in patent abstracts


    Figure 3 is the confusion matrix for the model described in Fig 2. This matrix shows where correct/incorrect predictions occur. For example, 21,622 patents were predicted to be RE but were not given the Y02 CPC classification.

    Figure 3. Confusion matrix for the random forest classification algorithm
    Figure 3. Confusion matrix for the random forest classification algorithm


    Enabled by the PatentsView project developed at AIR under the supervision of the Office of the Chief Economist at the USPTO, patent data usage is paramount to holding the federal government accountable for investing and encouraging innovation in science and technology in the areas important to scientists and the public. While challenges to increased domestic and international adoption of solar, wind, and other innovations are interwoven and interdisciplinary, the rate of innovation in renewable energy sector is an important component to analyze and understand as we push to transition away from fossil-fueled power.


  • What's New with PatentsView - December 2022

    What’s new with PatentsView: Our Algorithm is getting better!

    Over the last few months, PatentsView has been improving its disambiguation algorithms. These improvements give researchers, students, inventors, intellectual property enthusiasts, and anyone else with an interest in patent information more accurate data to work with.

    What has changed?

    Our algorithms have been updated to better represent patent trends by location and assignee. The updated algorithms increase accuracy in clustering — the grouping of raw information into similar organizations — and incorporate Open Street Mapping as an additional source. This results in better, more accurate data and analysis.

    These changes apply to all PatentsView data, including bulk downloads, legacy and Elasticsearch APIs, query builder tool, and list searches.  

    What are disambiguation algorithms?

    PatentsView’s data visualizations and analysis rely on a series of algorithms and post-processing techniques to sort inventors and assignees by name and place. We need this process, known as disambiguation, because patent data is often incomplete or inconclusive.

    For instance, the U.S. Patent and Trademark Office does not collect data on an inventor’s gender. So, PatentsView uses an algorithm to make an educated guess about gender based on an inventor’s name and location.

    In other cases, one inventor may apply for multiple patents using different variations on their name, like John Smith, J. Smith, and Johnny P. Smith. Our algorithms help determine if these are all the same inventor or three different inventors.

    Why is this important?

    Innovations and inventions benefit all of society, and that benefit is increased when every inventor can fully participate in the process. Accurate analysis of patent data helps identify gaps, and thus provides a first step toward closing those gaps.

    PatentsView’s goal is to provide the most accurate, up-to-date, and complete analysis of intellectual property data to foster better knowledge of the IP system and drive new insights into invention and innovation. Updates like this put us one step closer to that goal.

    You can learn more about our methods and sources at

  • Learn More About Inventor Demographic Attribution: Symposium Recordings Now Online

    At the end of August, PatentsView and the United States Patent and Trademark Office (USPTO) convened a group of 10 researchers, developers, and analysts to discuss the best and newest practices in identifying inventor demographics, including the use of gender and race. The event was structured as an all-day symposium with seven individual presentations and a three-person panel conversation on the social and economic implications of advancing our understanding of inventor demographics.

    Recordings from each session of the symposium are now available online.

    View the recordings now.

    Why Study Inventor Demographics?

    Quite simply, greater diversity and participation among inventors fuels more innovation and improves economic welfare. However, the current data suggest many people, particularly from diverse communities, are not represented among inventors. This means that challenges exist at various points along the pathway from inspiration, to inventor, to innovator and entrepreneur. Learning about who is and is not  participating as inventors and why is a critical step toward ensuring that all people can contribute equally to the innovation landscape and improve economic outcomes at both a personal and societal level.

    Unfortunately, the research into identifying and addressing such gaps is still in its infancy. One of the largest barriers is the lack of information about the demographics of inventors who apply for and ultimately receive patents.

    Disambiguation Methods and Practices

    Researchers have been using several methods and practices to shed light on inventor demographics when that information is not self-reported — the main processes are disambiguation and attribution.

    Each of these different processes has its own strengths and weaknesses. Speakers at the Advancing Research on Inventor Demographics Symposium, held on August 26, 2022, discussed the societal consequences of inequities in innovation. They also provided an in-depth discussion of the methods used to identify an inventor’s gender, race, ethnicity, and country of origin when that information is not reported in publicly available datasets.

    Leading Experts Discuss Cutting-Edge Research

    The symposium brought together economists, computer scientists, and others to discuss research methods, applied examples, and new ideas. The experts included:

    • Julio Raffo, a researcher in the Economics and Statistics Division at the World Intellectual Property Organization, who discussed gender attribution and the World Gender Name Dictionary 2.0.
    • Ernest Miguelez, a research fellow at the French National Centre for Scientific Research (CNRS), who discussed the PatentsView approach to inventor gender.
    • Michelle Saksena, a senior research economist at USPTO, who discussed assessing approaches for identifying the gender of inventors on patents.
    • Fangzhou Xie, a Ph.D. student at the Department of Economics at Rutgers University, who discussed using an R package he developed, called rethnicity, to predict ethnicity from names.
    • Francesco Lissoni, a professor of economics at the Bordeaux School of Economics within the University of Bordeaux, who discussed ways to determine inventors’ foreign-origin status using name analysis.
    • Jay Budzick, the CTO of Zest AI, who discussed using the Zest Race Predictor to uncover hidden disparities.
    • Marc Elliot, Distinguished Chair in Statistics and senior principal researcher at the RAND Corporation, who discussed methods to estimate race and ethnicity, and associated disparities, when records do not include self-reported data.
    • Trevon Logan, a Distinguished Professor of Economics at Ohio State University; Adam Gailey, a principal in the Financial Economics Practice of Charles River Associates; and Jason Dietrich, section chief for compliance and analytics policy at the Consumer Financial Protection Bureau, who participated in a panel discussion about gaps in innovation by race/ethnicity and gender.
  • What's New with PatentsView - September 2022

    AI & Innovation and Resource Pages Now Available

    As we start a new academic year, PatentsView is working to help researchers better understand the relationships across various patents and innovative technologies. To that end, we’ve launched two new pages: a topic page on Artificial Intelligence & Innovation, and a new Resources page.

    The Artificial Intelligence & Innovation Patent Dataset

    While artificial intelligence (AI) has advanced by leaps and bounds, researchers are still working to understand the many ways AI inventions and innovations have impacted technology and society. To help researchers delve into how this emerging technology is affecting our lives, the United States Patent and Trademark Office (USPTO) released the AI Patent Dataset (AIPD).

    The dataset includes an analysis of 13.2 million patent documents published through 2020, identifying which patents contain AI. The AIPD integrates seamlessly into PatentsView, allowing researchers to explore relationships between patents related to AI and the companies and inventors who hold them.

    What’s on the new AI & Innovation page?

    The new page contains an interactive data visualization that allows users to explore how patents are related to government interest, a deep dive into the machine learning model used to create the AIPD, and the latest AI-related news and reports.

    Visit the new AI & Innovation Page now to find out more.

    What’s on the new Resources page?

    The new PatentsView Resources page provides patent researchers, inventors, and intellectual property afficionados an easy way to find code snippets and packages to better use the PatentsView API, sources to help researchers use BigQuery to explore historical patent and PatentsView data, Zenodo links between patents and scientific articles, and more.

    You can also find information about the I3 Collaborative and IPRoduct repositories. I3 and IPRoduct are working groups for users to contribute to data frames and named data projects as well as export collaboratively made datasets. IPRoduct focuses on connecting patents to products in support of intellectual property rights, and I3 is a project to connect citations and patents among patenting groups worldwide.

    Get connected on the new Resources page.

  • You’re invited to a Symposium about inventors who patent! August 26, 2022

    Policymakers recognize that expanding the participation of women and other underrepresented groups in patenting is critical for growing and sustaining American innovation and prosperity. The challenge is that demographic characteristics of inventors such as sex at birth, gender, ethnicity, and race are not collected as part of the process of applying for or receiving a patent. Join us to learn more about how this challenge is addressed through alternative approaches, the accuracy of various approaches, and the important uses of these statistics.

    At PatentsView, with our partners at the University of Bordeaux, we have identified the gender of inventors named on U.S. patents from 1976 to 2021. By using the information supplied by the World Intellectual Property Organization (WIPO) from their World Gender-Name Dictionary along with other documentation, the team was able  to infer the gender for over 92.6% of the 2 million inventors residing in the United States and 92.3% of the nearly 5 million inventors named on U.S. patents residing around the world. You can download bulk data, as well as yearly data files, with gender information according to the latest algorithms for gender attribution on our website. The desire to map who is creating the latest technology is rapidly increasing. To be certain, this exploration is a global endeavor. 

    You are invited to a day-long symposium on Friday, August 26th, to learn about the latest developments for identifying and analyzing the demographics of inventors who patent. The symposium will feature experts from the United States Patent and Trademark Office, ZestAI, RAND Corporation, Rutgers University, University of Bordeaux, and WIPO. To tie everything together, a capstone panel with practitioners and leaders from the financial industry and academic institutions will share their perspectives on the research, next steps, and the overall policy relevance of knowing who drives innovation through patenting.  

    To read more about each presenter and their work, and to register for this symposium, please visit the PatentsView event page at See you there!

Button sidebar