Skip to main content

Release Notes for PatentsView

Data update: 12/30/2021

Data Changes

  1. The PatentsView database is current through December 30, 2021. All data are accessible through the web tool, API, query builder tool, and bulk downloads. The next update is planned to go live in September 2022 and will include data through June 30, 2022.
  2. Our inventor disambiguation algorithm has been updated to provide finer distinction between similarly named individuals, and a full disambiguation of our inventors has been conducted. As a result, inventor IDs for all inventors have been updated. The persistent_inventor_disambiguation table provides a mapping between the new inventor IDs and previous IDs.
  3. Fixed an issue where some patent citations had the month or day of the cited patent’s publication listed as 0. Citations without a day or month of publication provided will now be set as the first day of the provided month or year, as available.
  4. Fixed an issue where extra leading and trailing spaces appeared in the data. Extra spaces have been trimmed out.
  5. Fixed an issue in the current API where full text searches in patent end point were timing out.

Upcoming Changes

  1. PatentsView will delay the next data update until September 2022 with data through June 30, 2022.
  2. Testing of API version 0.1 continues. New endpoints have been added. View all currently available endpoints here. Request an API key here.
    • The API redesign changes the backend infrastructure from MySQL to Elasticsearch and streamlines the data elements into 1 primary endpoint (patent) with several other individual supporting endpoints (assignees, inventors, locations, etc.). For users will may require conducting multiple queries with different endpoints to join data elements.
    • Note: we will not discontinue the existing API endpoints until after the final release of the Elasticsearch endpoints.
  3. We plan to release the following changes in September 2022
    • Additional years of pregrant publication data (2001-2004)
    • Field name changes and structural revisions to the bulk download files to standardize and harmonize the data files between the granted patent data and the pregrant publications data. Changes will be communicated soon.
    • Final release of all Elasticsearch APIs – to include pregrant publications, gender fields, and long text fields.
      • Synchronize Elasticsearch API data with data update. I.e., when data updates are released, the update will include new data available in the Elasticsearch API.
    • Revised location disambiguation algorithm
    • Revised assignee disambiguation algorithm

Additional System Notices

  1. Our Gender & Innovation page has been updated with new interactive visualizations of the number of annual inventors by gender as well as additional related data, reports, and blog posts.
  2. We have added a new annualized data page making combined patent, inventor, assignee, and IPC data available in small and easy to manipulate files for each year from 1976 to 2020.
  3. A new post in the PatentsView Data in Action blog highlights our new annualized data and some of its applications, including analyses of international collaboration and national rates of inventions by women.

Data update: 09/30/2021

Data Changes

  1. Reparse of lawyer table to include missing lawyer information for lawyers listed in secondary and subsequent positions for affected patents from years 1976 to 2001.
  2. Government Interest statements are added to PG-Pub data (pre-granted published patent applications).
    • Associating Government Interest organizations with statements and contract award number extraction are in progress and are planned to be released with the next data update
  3. Fixed bulk download location table so the FIPS columns (state_fips and county_fips) are of character type with leading 0s.
  4. Fixed an issue in the API and Datatool where newer patents were missing values in “patent_average_processing_time” field.
  5. Reparse of 2020 long-text tables for detail_desc_text and brf_sum_text to include missing data.

Upcoming Changes

  1. API version 0.1 testing is in full swing for the new endpoints utilizing ElasticSearch.
    • The API redesign changes the backend infrastructure from MySQL to ElasticSearch and increases the number of endpoints so that every entity type has an endpoint. For users, this will require conducting multiple queries with different endpoints to join data tables.
    • An API key is required to obtain access to the beta API. Request an API Key.
    • Note: we will not discontinue the existing endpoint for a time after the final release of the ElasticSearch endpoint. 
  2. CPC Classification-related field name changes as detailed in the table below for December 2021 data update. The cpc_group table will be renamed to cpc_subclass; cpc_subgroup table will be renamed to cpc_classification; cpc_subsection table will be renamed to cpc_class. Definitions for tables and fields will remain the same – only names of fields have changed to foster better alignment with USPTO naming conventions.

Reclassification text for CPC tables for December 2021 data update

Additional System Notices

We strive to share users’ research and experiences with PatentsView data on our website for others to learn from. If you would like to participate in API version 0.1 testing, write a data-in-action blog post, or share your research using PatentsView data with the team, please contact us at contact@patentsview.org.  


Data update: 06/29/2021 

Data Changes 

  1. Reparse of brf_sum_text detail_desc_text, draw_desc_text tables to remove markup carried over from raw XML files.
  2. Re-export of claims tables to remove extraneous fields and match data dictionary. 
  3. Fixed an issue in post-processing of data which caused rawlocation_id in rawassignee and rawinventor to be unmapped. 
  4. Improved locations table: 

    - reduced duplicates within the table by using latitude and longitude combinations. 

    - improved FIPS code matching and standardized format. (State FIPS: 1-2 length; County FIPS 4-5 length) 

  5. Updated “document_number” field to be of “string” type in pregrant publications bulk download files. 

Additional System Notices 

  1. API beta testing is in full swing for two new endpoints that utilize ElasticSearch.  
  2. The API redesign will change the backend infrastructure from MySQL to ElasticSearch and increase the number of endpoints so that every entity type has an endpoint. This will ultimately decrease the fields that will be available from each endpoint. Moving forward the joining of the data tables will require the user to conduct multiple queries with different endpoints.  
  3. We plan to release the remainder of the endpoints for beta testing and eventually public use by the end of this year.  
  4. Note: we will not discontinue the existing endpoint for a time after the final release of the ElasticSearch endpoint.  

Upcoming Changes 

If you would like to participate in API beta testing, write a data in action blog post, or share your research using USPTO PatentsView data with the PatentsView team, please contact us at contact@patentsview.org

 


Data Update 03/30/2021

Data Changes

  1. Utilized updated disambiguation algorithm for this data update cycle (https://github.com/PatentsView/PatentsView-Disambiguation).
  2. Disambiguated PGPubs (pre-grant published patent applications) data is now available from June 2020 to March 2021.
  3. Generated new claim, brf_sum_text, detail_desc_text, and draw_desc_text tables for the year 2021. Data up to date through March 30, 2021.
  4. Trimmed first and last name of inventors and some inventor IDs in rawinventor table to remove irrelevant characters and spaces.
  5. Removed white spaces in first column of rawassignee table.
  6. Removed entries in PGPubs rawassignee table for null organization and name fields.
  7. Removed “sequence” field from rel_app_text table. Sequence is not a field given by XML files from USPTO so our data all had sequence = 0 before we removed the field altogether.
  8. Location tables were generated for this data update cycle using latitude and longitude data to reduce duplicates.

 

Additional System Notices

  1. Continued upgrades to bulk downloads’ pages and data download dictionaries to increase their functionality.
    • Updated the structure of the data dictionary pages to allow for better navigation by tool and dataset.
  2. Methods description pages for all PatentsView algorithms and processes are available under the Methods & Sources menu.


Upcoming Changes

The PatentsView team is planning changes to the API. The pilot platform is planned to be released following this data update at the end of July. An implementation plan will be available with this pilot platform for beta testers. The new API is planned to be available to the public at the end of August.

The major changes under this update are:

  • US Application and US Patent Citation fields will move to their own individual endpoints.
  • The fields available under the classification endpoints will be limited to classification lookups.
  • Please see the What’s New with PatentsView July 2021 article for a detailed list of changes.

If you would like to participate in beta testing, please contact the team at contact@patentsview.org.

 


Data Update: 12/29/2020

Data Changes

  1. Updated disambiguation algorithm (https://github.com/PatentsView/PatentsView-Disambiguation).
  2. Disambiguated PGPubs (pre-grant published patent applications) data.
  3. PGPubs data tables added to bulk downloads!
  4. Reparse of brf_sum_text detail_desc_text, draw_desc_text tables for years 2001-2004.
  5. Reparse of claims data for year 2001.
  6. Revised inventor disambiguated inventor bulk downloads table to include gender attribution fields: male_flag and attribution_status. Updated data dictionary.
  7. Inventor_gender table has been removed and its fields have been incorporated into the inventor table. Rawgender table is the raw gender attribution data.
  8. FIPs code with floating decimals in the location table have been resolved.


Additional System Notices

  1. Redesigned bulk downloads pages and data download dictionaries to increase their functionality.
  2. Methods description pages for all PatentsView algorithms and processes now available.
  3. More functionality and resources will be added to the Gender and Innovation page, stay tuned!


Upcoming Changes

The PatentsView team is planning changes to the API, to be released with the next data update (Data up to March 30, 2021 planned to be release first week of June 2021). The major changes under this update are,

  • US Application and US Patent Citation fields will move to their own individual endpoints.
  • The fields available under the classification endpoints will be limited to classification lookups.
  • Please see the API Update Table for a detailed list of changes.

If you would like to participate in beta testing, please contact the team at contact@patentsview.org.

 


Data update: 09/29/2020

Data Changes

  1. Reparse of rawlawyer, rawinventor, rawexaminer, and patent tables, years 1976-2001.
  2. Reparse of us_parties table to include missing applicant items, years 2013-2020 (pregrant applications database).
  3. Reparse of the 2005-2020 data for all text tables (brf_sum_text, claims, detail_desc_text, and draw_desc_text).
  4. Reparse of the 2002-2004 data for the claims table.
  5. Removal of location_inventor and location_assignee tables.
    • Location id can now be found in the patent_inventor and patent_assignee tables.
  6. Resolution of field 11 missing data errors in the WIPO table.
  7. Added H patents back to patent, patent_assignee, patent_inventor, patent_lawyer, rawassignee, rawinventor, and rawlawyer tables.

Additional System Changes

  1. Updated data download links to use SSL.
  2. Increased server size of API to reduce internal error messages.

Data update: 06/30/2020

Data Changes

  1. H patents are temporarily removed from the database awaiting reparsing. Currently, H patent numbers are incorrectly loaded with the corresponding check digits (from the raw data) loaded as part of the patent number. The data has been reparsed to remove erroneous mapping and will be added to the database for the next update
  2. Claims data has been reparsed to include newlines in the text and improved dependent field extraction has been implemented. Data from 1976 - 2000 (newlines, dependent field improvement) and 2005 - 2020 (newlines) has been posted on the bulk download page. Data from 2001 -2004 is being processed and will be posted as and when they become available

Pregrant Publications Data

  1. A beta version of USPTO’s pre-grant publication data is now available at : www.patentsview.org/download/pregrantpublications.html. Users should note that this is pre-release product and may be missing data elements. We encourage users to report any issues that they find in the data.

API Changes

  1. The API has moved to the Amazon’s Beanstalk platform and consequently the URL has changed. The new URL is https://api.patentsiew.org/. Previous URLs will redirect to the new URL, but POST requests will not work. The redirection is a temporary failsafe and users should update their URL to the updated URL.

Querytool Changes

  1. In an effort to reduce the delay in communication during a Querytool failure, we have implemented an email alert system. We hope to utilize this system to be bit more quicker in resolving any errors that the Querytool may face.

Data update: 03/31/2020 

Bulk Download Changes

  • Line Breaks retained in text data:
    • Claims: all text from 2001 and later will have the line breaks in the text
    • Brief Summary Text:
      • Data from 2020 and later will have the line breaks retained in the text
      • Line breaks for older data will get included when the first opportunity to reparse older data arises.
    • Detailed Description Text:
      • from 2020 and later will have the line breaks retained in the text.
      • Line breaks for older data will get included when the first opportunity to reparse older data arises.
    • Draw Description Text: Line breaks are not included at this time.
  • Location ID added to patent_assignee and patent_inventor
    • Previously to identify the location of a patent by the way of the assignee, patent_assignee needed to be joined with location_assignee and then with the location table. A similar join was needed for the patent inventor. To reduce the complexity, patent_assignee and patent_inventor tables will carry an additional field: location_id. This field will map to the id field from the location table. This makes the data in location_assignee and location_inventor redundant. Future releases will not carry these two tables.
  • Read In Scripts:
    • Example Python & R scripts that demonstrate reading each bulk download file will be available here: Read In Scripts This is a work in progress and will be updated over time.
  • Planned changes after 2020.03.22v1 release (Documentation and details will be added with the release)
    • Claims:
      • Remove duplicates in some of the claims yearly files where the first set of records (about 300K) are duplicated.
      • Remove NULL text data in some of the claims files.
      • Recode NUM field and add documentation.
      • Recode Exemplary field (replacing TRUE/FALSE with 0/1)
      • Re-order header to be consistent with data dictionary
    • Brief Summary Text:
      • Break files into yearly files
    • Draw Description Text:
      • Break files into yearly files
      • Include line breaks in the text