PatentsView Workshop on
Engaging User Communities - 2016
PatentsView is a patent data visualization and analysis platform intended to increase the value, utility, and transparency of US patent data. The initiative is supported by the Office of the Chief Economist in the US Patent & Trademark Office (USPTO).
The Office of the Chief Economist hosted the 2016 PatentsView Workshop on Engaging User Communities on October 28, 2016. The workshop was open to the public and located on the USPTO’s main campus in Alexandria, VA.
The workshop was focused on New Tools for Open Data. The goals of the workshop were to:
- Launch the new PatentsView Query Builder, and
- Gather feedback from patent data and analytics user communities in order to set priorities for future PatentsView open data products.
More than 100 participants attended the workshop, 50 of them in person. Participants joined from numerous federal agencies, including National Institutes of Health, Department of Energy, National Science Foundation, and US Census Bureau, as well as from private, nongovernmental, and legal firms. USPTO attendees included members of the Office of the Chief Economist, the patent examiner community, representatives from the Office of the Chief Information Officer and the Patent and Trademark Resource Center (PTRC) staff.
Alan Marco, Chief Economist at USPTO, kicked off the meeting and described the overarching goals for the PatentsView initiative: (i) democratize data; (ii) reduce redundancy; (iii) facilitate record linkage; and (iv) improve data quality.
Julia Lane, Professor at New York University, delivered a keynote address on technical and engagement approaches for government data sharing and use. She described the current momentum in the field and ongoing efforts towards building major research infrastructures to support research for evidence-based policy and social science.
Launching the PatentsView Query Builder
The new PatentsView Query Builder – https://datatool.patentsview.org/query/ - allows users to easily request and download subsets of PatentsView data. The PatentsView database incorporates the new inventor disambiguation algorithm developed at the 2015 PatentsView workshop. This algorithm was developed by Nicholas Monath and colleagues at UMass Amherst and is posted on a public code repository (https://github.com/iesl/inventor-disambiguation).
Amanda Myers (USPTO) and Evgeny Klochikhin (American Institutes for Research) introduced the Query Builder. They described how the user-friendly interface allows users to build detailed queries; search and export data in detail; download results in multiple formats; and learn how to programmatically access PatentsView data using the application programming interface (API). They went on to compare the new Query Builder and the existing API along five dimensions: results output; search flexibility; endpoints; ease of use; and programmatic access. Finally, Evgeny shared some of the common questions and suggestions that the team has received from users over the last year. The presentation is posted on the workshop website.
The PatentsView team developed two webinar tutorials that premiered at the workshop. These webinars describe to new users how data can be easily requested from the Query Builder in order to respond to a specific question, and how those output files could be used to create engaging visualizations or valuable analyses. The first webinar describes how researchers can request data on all patents issued to three successful US science and technology firms. These data were then transformed for co-inventor network analysis, in order to reveal patterns of innovation at the three firms. The second webinar demonstrates how data can be requested on all patenting activity within a state, in a given year. These data were combined with other public data on regional economic activity and were then analyzed and visualized in a four-quadrant dashboard using Tableau Public. The webinars are hosted on the workshop website and can also be found on the Query Builder help section.
Feedback from the PatentsView User Community
Three of the 2015 Inventor Disambiguation Workshop participants presented updates on their research since last year’s event. Nicholas Monath from UMass Amherst presented ongoing work on entity resolution (name disambiguation) for large, streaming datasets. Yang Guan-Can from ISTIC in China presented initial work on a model using PatentsView and other public data, to predict the likelihood of cancer patents leading to Food and Drug Administration approval. Luciano Kay at the University of California, Santa Barbara presented a prototype tool, InnovationPulse, which provides innovation analytics as a service. All speaker presentations are posted on the workshop website.
The fifty on-site participants broke out into five working sessions. In each session, a PatentsView team member moderated the discussion and participants shared feedback on three overarching questions:
- How does/could patent data help you in your daily work?
- What are the attractive features of the current tool and data source?
- What would you include in the next phase?
Moderators reported back to the larger group, including those joining by webcast. The team also collected and responded to feedback from webcast participants.
Summary of User Community feedback
Question 1: How does/could patent data help you in your daily work?
- Researchers at the US Census Bureau are matching patent data to internal Census data in order to describe economic activities of inventing and non-inventing firms
- Federal research agencies are using USPTO data to identify patents that were influenced by agency grant support and to support program evaluation
- Private firms and consultants are using patent data for firm-level and landscape-level technology analytics for internal and client needs.
- Agencies and private firms are using patent data analytics to respond to communications offices inquiries, for example, regarding local patenting activity
- Researchers and librarians at USPTO are exploring the relationship between government policies, legal policies, and patenting activities
- Multiple research groups are using patent data analytics to track emerging technologies and to create new patent-level metrics for value and quality.
- Multiple research groups are running quality assessment analyses on USPTO data products, including but not limited to PatentsView.
Question 2: What are the attractive features of the current tool and data source?
- The data visualization and overall user interface is intuitive and helpful, especially the mapping tool
- The inventor disambiguation results are crucial for analytics
- The breadth of available data fields in PatentsView is excellent
- The community appreciates the availability of clean and documented data for research and analysis
Question 3: What would you include in the next phase?
New Data Fields & Data Sources:
- Patent family (continuity and international family), priority date, provisional applications, full patent text, PCT information
- Maintenance fees and certificates of correction
- Full text of patents available through the tool
- Pre-2001 data for all tables in the MySQL patents database (example: usreldoc)
- Patents End-to-End, Global Dossier, and other USPTO resources
- Data from NIH or other agencies when there is a relevant government interest statement
- Standard or value metrics for individual patents - so researchers don’t need to recreate them
- Better data on firm ownership - Integrating legal entity identifiers - check Department of Treasury effort - open and free (this is still just in the conceptualization phase)
- Mergers/acquisitions data for assignees
- Crosswalk to transform latitude/longitude to other geographic measures (county or zip code) plus instructions for how to use the data in visualization tools like tableau
- Grant select researchers secure access to transactional data from examiners - like PAIR data - all identifying information for people/businesses in the data system (not just the patent face)
- Apply the disambiguation work to clean the author names in the prior art sections of the data
New Functionality & Platforms:
- Provide a similar web service for trademark data (and possibly copyright data)
- Crowdsource new variables and datasets from the user community (democratizing the data platform)
- Allow users to pull down different parts of the patent for a large number of patents (in the query tool)
- Create a community PatentsView page and possibly a GitHub repository so users can (i) share the verification analyses they’ve performed before using data for analytics, (ii) share issues identified in the data
- Allow users to map query results
- Create instructional videos for how the data can be acquired and then analyzed so that PTRC librarians can encourage PatentsView use
- Set up an open algorithm platform for standard patent quality measures that are always being developed in the researcher community; even consider hosting a competition within the platform and then QA process before anything is actually integrated into the DB
- Make more data accessible through the API because of limitations in the bulk data downloads - 100,000 row return limit is sometimes an issue
- Add the ability to search for null fields
- Include in the visualization tool a way to trace the history of innovation by technology field
Updates to Documentation:
- Include clear documentation of data sources in use
- Publish a PatentsView roadmap - for developers especially
- Track data updates in a clear way - so you don’t have to pull down the full file every time
- Provide a single point of information on USPTO open data efforts - how does it all fit together?
- Provide PatentsView code instructions for Linux based OS
- Enhance/verify inventor and assignee disambiguation outputs with the ongoing Census efforts in this space
- Share this work across USPTO, so that the open data efforts may all feed into one another