Skip to main content
 
 
 
IN THIS SECTION
ex. data visualization, research paper
  • What’s New with PatentsView – March 2024

    The PatentsView team is always working to make our data more complete, more accurate, and more useful. Recently, we completed a validation process of our assignee disambiguation algorithm. We created large, updated, ground truth dataset to calculate a current view of the performance of our assignee disambiguation algorithms.

    This blog highlights our continued commitment to assignee disambiguation validation by detailing the process we went through to build a hand-labeled ground truth dataset, the python packages and metrics used in assignee validation, and by openly disclosing results and their statistical significance. 

    Why is evaluation important?

    Model evaluation is important because it allows us to be publicly transparent about the performance of our disambiguated data, identify situations where our model does not perform as well as it could, and to make changes to improve overall performance.  We validate our assignee disambiguation algorithm by estimating key performance metrics, like precision and recall, that summarize its accuracy. 

    Assignee Labeling Process

    Estimating the performance of our disambiguation algorithm requires benchmarking data: some “ground truth” against which we can compare our predictions and assess the quality of our disambiguation. There are two main types of ground truth data used to evaluate entity resolution algorithms. 

    The first type is partially resolved entities, which consists of examples of matching entity mentions (e.g., an example of assignees from two patents, like “Microsoft” and “Microsoft Corp.” that we know refer to the same organization), as well as examples of non-matching entities (e.g., “Microcorp” and “Microsoft”), which upon research of company location and services offered, we know are two separate companies. 

    The second type of ground truth data is fully resolved entities. In this case, we find all patents for which Microsoft is an assignee and use that as complete ground truth for evaluation. We demonstrate how we cluster entities, such as all instances of “Microsoft,” to create our ground truth in the remaining paragraphs of this section.

    Our evaluation process focuses on the second type of data, fully resolved entities, because this method provides more robust statistical outputs. We employed three data labelers for over 100 combined hours to resolve the entities of over 250 randomly selected assignee mentions. To maximize the accuracy of the ground truth labels we created, that is groupings of rawassignees that are mentions of the same organization, we broke the process down into two main parts: (a) finding everything that could be a related entity and then (b) removing unlike assignees based on greater rawassignee detail.

    In step (a) finding everything that could be a related entity, for each assignee, we compared the assignee reference (organization name or first/last name) with hundreds of similar names in our database. This was done by the hand-labelers using a custom-made Streamlit application, which we designed to be both a query and data augmentation tool. Labelers pulled similar assignees by testing various potential name representations – name shortenings, abbreviations, potential misspellings, etc. – in Streamlit and saving the results. Streamlit then augmented the saved results from the labeler search by adding previous clusters (prior disambiguation result) that were associated with one or more of the rawassignees found by the labeler and were not previously included. 

    For step (b) removing unlike assignees based on greater rawassignee detail, hand-labelers reviewed the saved cluster output from step (a). The saved cluster data contained additional information about the rawassignee, including the associated patent, patent type, CPC classification, and location. Using filters, sorting, or any resource necessary, labelers carefully inspected all types of assignee data which could prove useful to remove any rawassignee mentions that should not be included in the final cluster. Dependent on the size of clusters, manual review could take between two minutes and an hour.

    Evaluation Packages – Written for PV

    PatentsView estimates performance metrics in a transparent manner using open-source entity resolution evaluation software. The functions are called in this location from our PatentsView-DB repository. We leverage two repositories; ER-Evaluation for the pairwise precision and recall metric functions and PatentsView-Evaluation to upload the relevant benchmark datasets (Binette & Reiter, 2023,1). You can find more technical details about this process in the documentation and code linked in this section.

    Statistical Significance

    Precision and recall are standard metrics in evaluating entity resolution and a detailed discussion about those metrics can be found in our last evaluation report (Monath, et al, 2021, 15). Based on the newest PatentsView Ground Truth labeled data, the latest data update achieved a precision of .90 and a recall of .72. This indicates that we are more likely to leave a rawassignee record[1] out of a cluster (False Negative) than erroneously include an additional record into a cluster (False Positive). 

    Precision and recall are calculated on an entity level evaluating the results of the most recent assignee disambiguation algorithm for the 228 assignee clusters, where we have ground truth data. See our last evaluation report for a more detailed explanation on the difference of entity-level versus rawassignee record level evaluation (Monath, 2021, 16-17). A standard deviation of around 4% for both of our estimates can be interpreted as 4% variability around the estimate – meaning that there is an approximately 68% likelihood that the true (population) precision is between 0.8604 to 0.9400, and that recall is between 0.6839 to 0.7699. This team believes that 4% variability is a narrow enough range for confidence in these evaluation metrics. 

    Metric

    Estimate

    Standard Deviation

    Precision

    0.900

    0.040

    Recall

    0.727

    0.043

    F1

    0.804

     

     

    Conclusion

    In conclusion, the recent advancements in PatentsView, particularly concerning the validation of assignee disambiguation algorithms, signify a steadfast commitment to data accuracy and transparency. Through meticulous evaluation processes and the creation of comprehensive ground truth datasets, we ensure quality of our disambiguated data.

    By employing multiple data labelers and leveraging sophisticated evaluation packages, such as ER-Evaluation and PatentsView-Evaluation, metrics like precision and recall are estimated, shedding light on the algorithm's performance.

    The latest update boasts an impressive precision of 0.90 and a recall of 0.727, indicating a high level of accuracy in entity resolution. These efforts underscore PatentsView's unwavering dedication to providing users with high-quality disambiguated assignee data and our commitment to transparency to our users in our processes and our work.

    Citations

    Binette, O., & Reiter, J. P. (2023). ER-Evaluation: End-to-End Evaluation of Entity Resolution Systems. The Journal of Open-Source Softwarehttps://joss.theoj.org/papers/10.21105/joss.05619.pdf

    Monath, N., Jones, C., & Madhavan, S. (2021, July). PatentsView: Disambiguating Inventors, Assignees, and Locations. Retrieved from https://s3.amazonaws.com/data.patentsview.org/documents/PatentsView_Disambiguation_Methods_Documentation.pdf 

     

    [1] PatentsView defined a “rawassignee record” as every mention of an assignee found on all granted patents and patent applications

  • A Systematic Patent Review of Connected Vehicle Technology Trends

    PatentsView provides researchers, inventors, and others with easy-to-use patent data and award-winning visualizations. This data helps people to discover trends, identify gaps, and recommend policy changes to improve patent and intellectual property systems. As the platform’s popularity grows, people are finding new ways to manipulate and analyze PatentsView data to achieve their goals.

    For instance, author Raj Bridgelall used PatentsView data to conduct a systematic patent review (SPR) to analyze the how innovation was advancing in the field of transportation in a recent Future Transportation article titled “A Systematic Patent Review of Connected Vehicle Technology Trends.” Specifically, the review found that patents related to vehicle deployments were focused in the areas of improving safety and secure wireless communications.

    What is SPR?

    SPR is a methodological framework that Bridgelall adapted from the systematic literature review (SLR) method. In the paper, he says that “SPR offers detailed insights into both the thematic and temporal trajectories of innovation in any technology field.”

    The SPR borrows from the SLR framework in its method of collecting data, selecting relevant information for analysis, and analyzing and interpreting the data with key themes and a focused objective in mind. However, where SLR typically centers qualitative methods to analyze titles and abstracts, SPR also incorporates a quantitative approach that relies on how frequently specific terms are used.

    Bridgelall used PatentsView data, among other sources, to identify 220 U.S. patents from 2018-2022 related to automotive technology. His review separated them into categories, such as computing resources, cyber security, and driving safety. He found that patents are increasingly focused on driving safety and wireless communications, which he said, “aligns with broader goals of enhancing safety and situational awareness in transportation.”

    The Benefits of SPR

    In the paper, Bridgelall writes that most studies related to innovation in automotive technology are focused on technological aspects of the work and practical applications. His review provides a broader analysis that he says will help researchers identify gaps in the existing research and pinpoint areas for potential future innovation.

    This research can also help policymakers understand where changes in policy and standardization might have an impact on the field. For instance, Bridgelall highlighted a 2020 move by the U.S. Federal Communications Commission that repurposed a large portion of a safety band dedicated to vehicle use. Bridgelall said that doing so caused uncertainty and stalled investments in connected vehicle technology, which has “the potential to reduce accidents, optimize traffic flow, and enhance the driving experience by communicating with each other (V2V) and with everything else (V2X).”

    He hopes that his introduction of the SPR methodology will lay the groundwork for future research by himself and others to expand upon his analysis and identify international and long-term trends.

    You can download the full paper on mdpi.com using the link above.

  • What Can PatentsView Do for You?

    PatentsView was launched in 2017 to help people access, understand, and use patent data in their research. The award-winning data visualizations shed light on trends and can show how inventors and innovation have changed since 1976, and the bulk data downloads, API tool, and community collaborative can let you dig deeper into patents data.

    People have recently used PatentsView data to explore the gender of inventors and how that balance has changed over time. They have looked at patent data to better understand innovation in Latin America, and they have mapped skill-relatedness networks to get a better idea about how the economy is evolving.

    Using PatentsView data

    Do you have a question about patents, innovation, inventors, or technology? PatentsView gives you several options to access and analyze its data.

    1. Patent Visualizations: A collection of charts and graphs to  explore patent data in an accessible format. The visualization tools allow the user to search for keywords, filter by location, make comparisons by attribute, and view a network of patents that shows a big picture.

    2. Community Collaborative: A moderated community that offers collaborative spaces, such as a discussion forum and the Data in Action Spotlight. The Data   in Action Spotlight is a blog featuring information about research done with PatentsView data and tools, updates to the PatentsView site, relevant events, and more.

    3. API Tool: The API tool allows software developers and researchers to work with our data within their local environment. Documentation is available on the ‘Why Explore Patent Data?’ page for the API tool as well as its query language and a list of specialized endpoints to search and filter the data. For a detailed example in Python, view our PatentsView Search API Tutorial on GitHub. We recently updated it with more robust descriptions and documentation.

    4. Data Query Builder: A user-friendly query builder interface to allow users to create their own datasets based on specific search criteria. The query builder is distinct from the API Tool. The final dataset from any customized query is made available by providing a link sent to the user's email address. NOTE: At the time of posting, The Query Builder tool is currently down and unable to send requested data to your email address. Until it is back online, please refer to our service desk to inquire about customized data requests.

    5. Bulk Data Download: A page that provides the data used to create the website and its visualizations. It is split into various tables that contain raw, disambiguated, or processed data. The page also provides example Python and R scripts meant to assist with reading the data after they have downloaded.

    Updates and Improvements

    The PatentsView team is constantly adding new patent data, improving functionality, and fixing bugs and errors. For instance, PatentsView recently released data through September 30, 2023, and is working on a new data release in March.

    Earlier this year, the team implemented a new gender attribution algorithm, and updated the location standardization process. Visit the release notes page for a full list of updates and improvements over time.

    We Want to Hear from You

    Many of these improvements were spurred by user suggestions and questions. Your exploration of the data and reporting of discrepancies and errors helps support our team to return the highest quality data to the public.

    Please contact our service desk with any data questions and suggestions you may have, including Data in Action Spotlight post ideas. Let us know if you have recently published a paper or given a presentation based on PatentsView data or if you have an upcoming event that uses PatentsView in some way.

    To receive regular updates on what the PatentsView team is working on, subscribe to our bi-monthly newsletter.

  • The Case of the Missing Assignee Data: How the AIA Affected Pre-grant Assignee Information on Patent Applications

    The PatentsView team has received several inquiries about assignee data that appears to be missing starting in 2013. The number of assignees in pre-grant patent data seems to have suddenly fallen around that time, leading users to think that the data is missing or there is an error.

    We always appreciate our users bringing issues to our attention. It allows us to keep PatentsView data accurate and up to date. In this case, however, what you are seeing is not an error and there is no data missing. The sudden change is due to a shift in policy.

    The America Invents Act and Patent Data

    The 2011 America Invents Act (AIA) was designed to reduce waiting times in the patent application process. There were also provisions meant to reduce the number of lawsuits and other litigation that inventors and entrepreneurs faced and to bring the U.S. process and data in line with other countries.

    The bill also changed the rules for who could be listed as an applicant on patent applications. Prior to the act, applicants had to be a person. Most often, it was the inventor. If that person was working as part of a company or organization, then the organization would be listed as an assignee.

    Changes in Assignee Data

    The AIA allowed organizations to be listed as applicants and no longer required them to be listed as assignees. Beginning around 2013, they were doing just that. The number of assignees in patent applications dropped significantly because of this change, as shown in the graph below.

    A line graph that shows a drop in the number of organizations listed as applicants around 2013.

    However, the change in the application process did not affect granted patent data. Once the U.S. Patent and Trademark Office grants a patent, it generally lists the organization as an assignee. Therefore, the granted patent data does not reflect the same change in number of assignees as the pre-grant data.

    All this means that what looks like missing data — fewer than expected assignees in pre-grant data — is actually just a change in the way the data is collected and reported.

    Questions About PatentsView Data

    Our users are the greatest asset we have at PatentsView. If you have an issue with or question about PatentsView tools or data, you can browse our community forum or visit our service desk. We want to hear from you.

     

  • New Report from USPTO Highlights COVID-Related Patents

    COVID-19 disrupted all our lives, but it also opened the door to innovation. The speed at which researchers, companies, and universities developed new tests, treatments, and vaccines was unprecedented. A new report from the U.S. Patent and Trademark Office (USPTO) Office of the Chief Economist found that much of the innovation around diagnosing COVID-19 was led by universities and small companies.

    Diagnosing COVID-19: A perspective from U.S. patenting activity looks at patents filed that relate to diagnosing the COVID-19 as it emerged and spread. It contributes to a growing body of research that examines how innovation responds to crisis. The report was released on October 23.

    The research team used PatentsView to identify patents that include a government interest statement. This helped them to identify which patents were funded by government agencies.

    Key Findings from the Report

    The report looked at patents filed or issued through April 2023 that helped identify and detect the SARS-CoV-2 virus. Some of the key findings in the report included:

    • The number of COVID-19 diagnostic patent filings that were published by the USPTO surged and then receded in the months following the emergence of the coronavirus—such publications peaked in the fourth quarter of 2021, which generally reflects applications filed in April, May, and June of 2020.

    • COVID-19 diagnostic filings make up about 30% of all COVID-19 public patent filings, hovering at about one out of every three COVID-19 filings at the USPTO.

    • Small companies and universities led the way in COVID-19 diagnostic public patent filings at the USPTO with the top filer being a diagnostic startup company.

    • U.S. government financial support helped spur COVID-19 diagnostic inventions, as indicated by government interest statements contained in the filings. About 10.7% of all COVID-19 public patent filings show government support, with the National Institutes of Health leading other agencies.

    • U.S.-based applicants are leading those from other countries in U.S. COVID-19 diagnostic public patent filings, making up most of the volume, including most of the top 21 applicants.

    • COVID-19 diagnostic public patent filings are concentrated in a few technologies such as analyzing materials and measuring enzymes, nucleic acids, and microorganisms.

    • Many applications for inventions directed at COVID-19 diagnostics also disclose methods of treatment (about 8.6%). For instance, inventions for antibodies may diagnose and treat COVID-19.

    • Among 5,585 global COVID-19 diagnostic patent families found in the study, 47% have at least one filing at the China National Intellectual Property Administration (CNIPA), the most of any jurisdiction.

    Read the Full Report

    The full report is available on the USPTO website.

Button sidebar