Skip to main content
 
 
 
IN THIS SECTION

Release Notes for PatentsView

Data update: 02/13/2024

Data Updates

  • PatentsView products now include data through 12/31/2023.
  • The data dictionaries for granted patents and pre-granted published applications are up to date for this update.

Search API Updates

Data Quality Updates

  • No large data quality fixes this quarter

We value your feedback! Experimentation to improve the assignee algorithm occurred during the last update cycle. Please report any issues or questions to the new PatentsView Help Desk. 


Data update: 09/30/2023

Data Updates

  • PatentsView products now include data through 09/30/2023.
  • The data dictionaries for granted patents and pre-granted published applications are up to date for this update.

Search API Updates

  • Data is up to date through 09/30/2023
  • Reminder: A jupyter notebook demonstrating how to use the Elasticsearch API in Python is available on the PatentsView Code Snippets GitHub: https://github.com/PatentsView/PatentsView-Code-Snippets/tree/master/07_Search_API_demo

Data Quality Updates

  • Persistent Assignee Disambiguated IDs for the September 2020 and December 2020 releases that were previously missing for patents issued between July and September 2020 have been restored in the table g_persistent_assignee.

We value your feedback! Experimentation to improve the assignee algorithm occurred during the last update cycle. Please report any issues or questions to the new PatentsView Help Desk. 


Data update: 06/29/2023

Data Updates

  • PatentsView products now include data through 06/29/2023.
  • The data dictionaries for granted patents and pre-granted published applications are up to date for this update.

Search API Updates

Data Quality Updates

  • A gap in pgpubs applicant data between 2020-10-15 and 2021-03-25 has been patched.
  • Expanded coverage of all patent classification tables – classification codes which do not appear in the corresponding index of classifications due to typos on the original document or incomplete format standardization will no longer be excluded from data exports.
  • Significant data cleaning for pg_uspc_at_issue:
    • Sequence values, previously erroneously counted from 2, have been re-normalized to count from 1
    • Formatting of classification codes, especially from applications published in 2005-2007, has undergone standardization to more consistently align with later patents and the USPTO's index of classifications.
  • First listed cpc classifications for granted patents, previously missing from the g_cpc_at_issue table, have been restored.
  • First listed uspc classifications for granted patents, previously missing from the g_uspc_at_issue table for grant years 2002 onward, have been restored for years 2005 onward. 2002-2004 restoration is still in progress.
  • An export error causing data in g_us_patent_citation.tsv to be incomplete has been corrected.
  • We noticed some duplication in a handful of tables that support our data tools on the website (not the bulk download files) and remedied those tables
  • The table g_patents was previously inconsistent in whether the abstract of a patent with no abstract text was represented with an empty string (“”), the string “NULL” or a non-string NULL marker. These have been standardized to use a non-string NULL representation.
  • Incorrect display of inventor names including Unicode characters has been corrected.

We value your feedback! Experimentation to improve the assignee algorithm occurred during the last update cycle. Please report any issues or questions to the new PatentsView Help Desk. 


Data update: 03/30/2023

Data Updates

  • PatentsView products now include data through 03/30/2023.
  • Implemented a new gender attribution algorithm (Genderit). All inventors (granted and published applications) are run through this new gender attribution process.
  • There is an update to the patent and PGPub crosswalk table. The table now has at least one record for every patent and PGPub number and is no longer limited to only granted patents with corresponding PGPub documents. In cases where there are multiple associated documents, the current document flag indicates the latest document.
  • The data dictionaries for granted patents and pre-granted published applications are up to date for this update.
  • Removed cpc_symbol_position from g_cpc_current and pg_cpc_current tables. We found that all of the information contained in the symbol position field was also contained in the sequence field for each classification table. Because the sequence field was the more informative of the two columns, we decided to remove the symbol position column to reduce both redundancy and file size for the users of our file downloads.

Search API Updates

Data Quality 

  • Fixed location disambiguation records for the subset of incorrect state field instances which affected patent (0.002% of records) and pregrant publication (0.5%) raw location records.
  • The g_application table’s application_id and g_us_application_citation table’s citation_document_number fields have had their values re-formatted to align with matching columns in other tables. The application_id field entries are now 8 uninterrupted digits instead of being broken with a slash. The citation_document_number field entries are now 11 digits instead of 4 digits and 7 digits broken by a slash.
  • Corrected records from the pre-grant published application and granted patent gov_interest_org tables to remove infrequent instances of malformed or deprecated organization identifiers.
  • Genderit algorithm implementation involved reattribution of gender to all the data. For more information, see the June edition of What’s New with PatentsView.

We value your feedback! Experimentation to improve the assignee algorithm occurred during the last update cycle. Please visit our service desk to let the team know of data quality issues as you come across them.


Data update: 09/29/2022

Data Updates

  • PatentsView products now include data through 09/29/2022.
  • Tables that demonstrate how assignee and inventor unique IDs change from update to update are now available for pre-grant applications, called pg_persistent_assignee and pg_persistent_inventor
  • Titles of CPC classes are now presented in a downloadable table titled pg_cpc_title on the pregrant downloads page. This table is identical to to g_cpc_title on the granted patent downloads page. This data is provided on both downloads pages for the convenience of users.
  • The location standardization process was updated to include exact text matching for foreign countries where we have a unique city, country combination for that location (e.g. Tokyo, Japan will be exactly matched to Tokyo, Japan through text matching). The location standardization document will be expanded shortly after the update for more specific methodology.
  • Gender was not attributed to inventors new to the granted patent database between June 30, 2022, and September 29, 2022 (quarter 3 of 2022). Inventors who were granted patents in this timeframe but had previously been attributed gender in the PatentsView database still have gender attributed for this quarter’s update.

Search API Updates

Reminder: A jupyter notebook demonstrating how to use the Elasticsearch API in Python is available on the PatentsView Code Snippets GitHub: https://github.com/PatentsView/PatentsView-Code-Snippets/tree/master/07_Search_API_demo

  • Data is loaded to endpoints for granted patents including attorney, patent citations, application citations, other reference, foreign citations, and related application text.

Data Quality

  • ~5M pregrant publication locations were updated that were previously null
  • New issues have been detected with the assignee attribution method and we may be experimenting to improve the performance of the algorithm
  • Missing leading zeros in State FIPS codes are now restored. County FIPS codes erroneously stored as decimals values have been revised to integer numbers.
  • The granted patent crosswalk now includes all patent IDs and all document IDs, regardless of whether they have a corresponding document ID or patent ID, respectively.

Data update: 06/30/2022

Data Changes

  1. This update contains updated data from 01-01-2022 through 06-30-2022.
  2. The location disambiguation algorithm has been replaced with a look-up table and fuzzy matching mechanism using Elasticsearch search capabilities. Note that all location IDs have changed with the implementation of this new methodology but will remain static moving forward. Documentation for the location standardization methodology is available. 
  3. We have updated the assignee disambiguation to address data quality issues reported in the previous data release. Documentation on the revised disambiguation algorithm is available at patentsview.org/disambiguation.
  4. The bulk download offerings have been restructured and revised into fewer tables for simpler linking across tables and datasets.
    • Restructured tables include application (for pre-granted publications), publication, assignee, patent_assignee, publication_assignee, lawyer, patent_lawyer, cpc_group, cpc_subgroup, cpc_subsection, patent_govintorg, government_org, inventor, patent_inventor, publication_inventor, uspc, mainclass_current, subclass_current, wipo, nber, nber_category, nber_subcategory.
    • Table names were modified to include ‘g’ or ‘pg’ prefixes to indicate whether the file contains granted patent or pre-grant publication data, respectively. Table field names were revised to harmonize with USPTO naming conventions and standardized across tables.
    • NBER classification tables have been archived and will no longer be updated. These tables can still be retrieved by request to our service desk..
    • The crosswalk table for linking pre-grant publications data to granted patent data remains available and a part of the new data table offerings.
    • A detailed table-by-table and field-by-field list of revisions are available to view in PDF format.
    • A logic diagram for linking tables by field is pictured here.
  5. Pre-grant publications for years 2001-2004 are now available
  6. WIPO technology classifications are  now available on pre-grant publications.
  7. Government Interest agency and contract award number extraction is now available on pre-grant publications.
  8. Missing applicants for [X – Y] have been re-parsed and added to the pre-grant publications.
  9. Gender has been attributed to inventors through the end of 2021. This is signified in the g_inventor_disambiguated table by the male_flag field.

API Updates

  1. A complete list of API endpoints and fields is available and listed at the bottom of the Swagger interface of the API.
  2. The legacy (MySQL) API is scheduled for decommissioning at the end of this year, but is currently up to date with data through June 30, 2022.

Quality Issues

  • PatentsView and USPTO have identified a data quality issue with the pre-grant publication applicant data. In the next release of patent data, we will revise the applicant data based on mapping updates determined by USPTO.
  • An issue was identified in the inventor disambiguated data where some inventors were not clustered together if  they appear with their first name in one patent and with their first initial in another in the 01-01-2021 to 06-30-2021 timeframe. These inventors were re-clustered in this data update and therefore some of the inventor_ids may have changed in this timeframe.

Data update: 12/30/2021

Data Changes

  1. The PatentsView database is current through December 30, 2021. All data are accessible through the web tool, API, query builder tool, and bulk downloads. The next update is planned to go live in September 2022 and will include data through June 30, 2022.
  2. Our inventor disambiguation algorithm has been updated to provide finer distinction between similarly named individuals, and a full disambiguation of our inventors has been conducted. As a result, inventor IDs for all inventors have been updated. The persistent_inventor_disambiguation table provides a mapping between the new inventor IDs and previous IDs.
  3. Fixed an issue where some patent citations had the month or day of the cited patent’s publication listed as 0. Citations without a day or month of publication provided will now be set as the first day of the provided month or year, as available.
  4. Fixed an issue where extra leading and trailing spaces appeared in the data. Extra spaces have been trimmed out.
  5. Fixed an issue in the current API where full text searches in patent end point were timing out.

Upcoming Changes

  1. PatentsView will delay the next data update until September 2022 with data through June 30, 2022.
  2. Testing of API version 0.1 continues. New endpoints have been added. View all currently available endpoints here. Request an API key here.
    • The API redesign changes the backend infrastructure from MySQL to Elasticsearch and streamlines the data elements into 1 primary endpoint (patent) with several other individual supporting endpoints (assignees, inventors, locations, etc.). For users will may require conducting multiple queries with different endpoints to join data elements.
    • Note: we will not discontinue the existing API endpoints until after the final release of the Elasticsearch endpoints.
  3. We plan to release the following changes in September 2022
    • Additional years of pregrant publication data (2001-2004)
    • Field name changes and structural revisions to the bulk download files to standardize and harmonize the data files between the granted patent data and the pregrant publications data. Changes will be communicated soon.
    • Final release of all Elasticsearch APIs – to include pregrant publications, gender fields, and long text fields.
      • Synchronize Elasticsearch API data with data update. I.e., when data updates are released, the update will include new data available in the Elasticsearch API.
    • Revised location disambiguation algorithm
    • Revised assignee disambiguation algorithm

Additional System Notices

  1. Our Gender & Innovation page has been updated with new interactive visualizations of the number of annual inventors by gender as well as additional related data, reports, and blog posts.
  2. We have added a new annualized data page making combined patent, inventor, assignee, and IPC data available in small and easy to manipulate files for each year from 1976 to 2020.
  3. A new post in the PatentsView Data in Action blog highlights our new annualized data and some of its applications, including analyses of international collaboration and national rates of inventions by women.

Data update: 09/30/2021

Data Changes

  1. Reparse of lawyer table to include missing lawyer information for lawyers listed in secondary and subsequent positions for affected patents from years 1976 to 2001.
  2. Government Interest statements are added to PG-Pub data (pre-granted published patent applications).
    • Associating Government Interest organizations with statements and contract award number extraction are in progress and are planned to be released with the next data update
  3. Fixed bulk download location table so the FIPS columns (state_fips and county_fips) are of character type with leading 0s.
  4. Fixed an issue in the API and Datatool where newer patents were missing values in “patent_average_processing_time” field.
  5. Reparse of 2020 long-text tables for detail_desc_text and brf_sum_text to include missing data.

Upcoming Changes

  1. API version 0.1 testing is in full swing for the new endpoints utilizing ElasticSearch.
    • The API redesign changes the backend infrastructure from MySQL to ElasticSearch and increases the number of endpoints so that every entity type has an endpoint. For users, this will require conducting multiple queries with different endpoints to join data tables.
    • An API key is required to obtain access to the beta API. Request an API Key.
    • Note: we will not discontinue the existing endpoint for a time after the final release of the ElasticSearch endpoint. 
  2. CPC Classification-related field name changes as detailed in the table below for December 2021 data update. The cpc_group table will be renamed to cpc_subclass; cpc_subgroup table will be renamed to cpc_classification; cpc_subsection table will be renamed to cpc_class. Definitions for tables and fields will remain the same – only names of fields have changed to foster better alignment with USPTO naming conventions.

Reclassification text for CPC tables for December 2021 data update

Additional System Notices

We strive to share users’ research and experiences with PatentsView data on our website for others to learn from. If you would like to participate in API version 0.1 testing, write a data-in-action blog post, or share your research using PatentsView data with the team, please visit our service desk.  


Data update: 06/29/2021 

Data Changes 

  1. Reparse of brf_sum_text detail_desc_text, draw_desc_text tables to remove markup carried over from raw XML files.
  2. Re-export of claims tables to remove extraneous fields and match data dictionary. 
  3. Fixed an issue in post-processing of data which caused rawlocation_id in rawassignee and rawinventor to be unmapped. 
  4. Improved locations table: 

    - reduced duplicates within the table by using latitude and longitude combinations. 

    - improved FIPS code matching and standardized format. (State FIPS: 1-2 length; County FIPS 4-5 length) 

  5. Updated “document_number” field to be of “string” type in pregrant publications bulk download files. 

Additional System Notices 

  1. API beta testing is in full swing for two new endpoints that utilize ElasticSearch.  
  2. The API redesign will change the backend infrastructure from MySQL to ElasticSearch and increase the number of endpoints so that every entity type has an endpoint. This will ultimately decrease the fields that will be available from each endpoint. Moving forward the joining of the data tables will require the user to conduct multiple queries with different endpoints.  
  3. We plan to release the remainder of the endpoints for beta testing and eventually public use by the end of this year.  
  4. Note: we will not discontinue the existing endpoint for a time after the final release of the ElasticSearch endpoint.  

Upcoming Changes 

If you would like to participate in API beta testing, write a data in action blog post, or share your research using USPTO PatentsView data with the PatentsView team, please visit our service desk

 


Data Update 03/30/2021

Data Changes

  1. Utilized updated disambiguation algorithm for this data update cycle (https://github.com/PatentsView/PatentsView-Disambiguation).
  2. Disambiguated PGPubs (pre-grant published patent applications) data is now available from June 2020 to March 2021.
  3. Generated new claim, brf_sum_text, detail_desc_text, and draw_desc_text tables for the year 2021. Data up to date through March 30, 2021.
  4. Trimmed first and last name of inventors and some inventor IDs in rawinventor table to remove irrelevant characters and spaces.
  5. Removed white spaces in first column of rawassignee table.
  6. Removed entries in PGPubs rawassignee table for null organization and name fields.
  7. Removed “sequence” field from rel_app_text table. Sequence is not a field given by XML files from USPTO so our data all had sequence = 0 before we removed the field altogether.
  8. Location tables were generated for this data update cycle using latitude and longitude data to reduce duplicates.

 

Additional System Notices

  1. Continued upgrades to bulk downloads’ pages and data download dictionaries to increase their functionality.
    • Updated the structure of the data dictionary pages to allow for better navigation by tool and dataset.
  2. Methods description pages for all PatentsView algorithms and processes are available under the Methods & Sources menu.


Upcoming Changes

The PatentsView team is planning changes to the API. The pilot platform is planned to be released following this data update at the end of July. An implementation plan will be available with this pilot platform for beta testers. The new API is planned to be available to the public at the end of August.

The major changes under this update are:

  • US Application and US Patent Citation fields will move to their own individual endpoints.
  • The fields available under the classification endpoints will be limited to classification lookups.
  • Please see the What’s New with PatentsView July 2021 article for a detailed list of changes.

If you would like to participate in beta testing, please visit our service desk..

 


Data Update: 12/29/2020

Data Changes

  1. Updated disambiguation algorithm (https://github.com/PatentsView/PatentsView-Disambiguation).
  2. Disambiguated PGPubs (pre-grant published patent applications) data.
  3. PGPubs data tables added to bulk downloads!
  4. Reparse of brf_sum_text detail_desc_text, draw_desc_text tables for years 2001-2004.
  5. Reparse of claims data for year 2001.
  6. Revised inventor disambiguated inventor bulk downloads table to include gender attribution fields: male_flag and attribution_status. Updated data dictionary.
  7. Inventor_gender table has been removed and its fields have been incorporated into the inventor table. Rawgender table is the raw gender attribution data.
  8. FIPs code with floating decimals in the location table have been resolved.


Additional System Notices

  1. Redesigned bulk downloads pages and data download dictionaries to increase their functionality.
  2. Methods description pages for all PatentsView algorithms and processes now available.
  3. More functionality and resources will be added to the Gender and Innovation page, stay tuned!


Upcoming Changes

The PatentsView team is planning changes to the API, to be released with the next data update (Data up to March 30, 2021 planned to be release first week of June 2021). The major changes under this update are,

  • US Application and US Patent Citation fields will move to their own individual endpoints.
  • The fields available under the classification endpoints will be limited to classification lookups.
  • Please see the API Update Table for a detailed list of changes.

If you would like to participate in beta testing, please visit our service desk..

 


Data update: 09/29/2020

Data Changes

  1. Reparse of rawlawyer, rawinventor, rawexaminer, and patent tables, years 1976-2001.
  2. Reparse of us_parties table to include missing applicant items, years 2013-2020 (pregrant applications database).
  3. Reparse of the 2005-2020 data for all text tables (brf_sum_text, claims, detail_desc_text, and draw_desc_text).
  4. Reparse of the 2002-2004 data for the claims table.
  5. Removal of location_inventor and location_assignee tables.
    • Location id can now be found in the patent_inventor and patent_assignee tables.
  6. Resolution of field 11 missing data errors in the WIPO table.
  7. Added H patents back to patent, patent_assignee, patent_inventor, patent_lawyer, rawassignee, rawinventor, and rawlawyer tables.

Additional System Changes

  1. Updated data download links to use SSL.
  2. Increased server size of API to reduce internal error messages.

Data update: 06/30/2020

Data Changes

  1. H patents are temporarily removed from the database awaiting reparsing. Currently, H patent numbers are incorrectly loaded with the corresponding check digits (from the raw data) loaded as part of the patent number. The data has been reparsed to remove erroneous mapping and will be added to the database for the next update
  2. Claims data has been reparsed to include newlines in the text and improved dependent field extraction has been implemented. Data from 1976 - 2000 (newlines, dependent field improvement) and 2005 - 2020 (newlines) has been posted on the bulk download page. Data from 2001 -2004 is being processed and will be posted as and when they become available

Pregrant Publications Data

  1. A beta version of USPTO’s pre-grant publication data is now available at : www.patentsview.org/download/pregrantpublications.html. Users should note that this is pre-release product and may be missing data elements. We encourage users to report any issues that they find in the data.

API Changes

  1. The API has moved to the Amazon’s Beanstalk platform and consequently the URL has changed. The new URL is https://api.patentsiew.org/. Previous URLs will redirect to the new URL, but POST requests will not work. The redirection is a temporary failsafe and users should update their URL to the updated URL.

Querytool Changes

  1. In an effort to reduce the delay in communication during a Querytool failure, we have implemented an email alert system. We hope to utilize this system to be bit more quicker in resolving any errors that the Querytool may face.

Data update: 03/31/2020 

Bulk Download Changes

  • Line Breaks retained in text data:
    • Claims: all text from 2001 and later will have the line breaks in the text
    • Brief Summary Text:
      • Data from 2020 and later will have the line breaks retained in the text
      • Line breaks for older data will get included when the first opportunity to reparse older data arises.
    • Detailed Description Text:
      • from 2020 and later will have the line breaks retained in the text.
      • Line breaks for older data will get included when the first opportunity to reparse older data arises.
    • Draw Description Text: Line breaks are not included at this time.
  • Location ID added to patent_assignee and patent_inventor
    • Previously to identify the location of a patent by the way of the assignee, patent_assignee needed to be joined with location_assignee and then with the location table. A similar join was needed for the patent inventor. To reduce the complexity, patent_assignee and patent_inventor tables will carry an additional field: location_id. This field will map to the id field from the location table. This makes the data in location_assignee and location_inventor redundant. Future releases will not carry these two tables.
  • Read In Scripts:
    • Example Python & R scripts that demonstrate reading each bulk download file will be available here: Read In Scripts This is a work in progress and will be updated over time.
  • Planned changes after 2020.03.22v1 release (Documentation and details will be added with the release)
    • Claims:
      • Remove duplicates in some of the claims yearly files where the first set of records (about 300K) are duplicated.
      • Remove NULL text data in some of the claims files.
      • Recode NUM field and add documentation.
      • Recode Exemplary field (replacing TRUE/FALSE with 0/1)
      • Re-order header to be consistent with data dictionary
    • Brief Summary Text:
      • Break files into yearly files
    • Draw Description Text:
      • Break files into yearly files
      • Include line breaks in the text