What's New with PatentsView - June 2023
This month in PatentsView news, the data team will release quarter four data for 2022 and the quarter one data for 2023. The disambiguated and processed data will include patents and published pre-grant patent applications from September 30, 2022, to March 30, 2023. In addition to bulk downloadable data for granted patents and pre-grant application publications, the legacy MySQL API, Elasticsearch beta API, and site visualizations will also be updated with data through March 30, 2023. To celebrate the completion of processing for the year 2022, we're lighting sparklers just in time for the independence and Emancipation Day celebrations in the United States!
In our previous data updates, PatentsView gender data was attributed through a partnership with faculty at the University of Bordeaux. Starting from the final quarter of 2022 up to the present, our PatentsView data scientists have attributed gender to inventors using World Intellectual Property’s (WIPO’s) Genderit Method algorithm, which has been adjusted by our team. The new attribution method has been applied to all historic records and assigned to disambiguated inventors based on the majority gender of raw inventor records that combine to make the disambiguated inventor. For instance, if over 50% of raw records for a given inventor are marked female, then the inventor is attributed as female. In cases where exactly 50% of raw inventor records are marked as both female and male (which did occur), the gender remains unattributed.
PatentsView has brought the inventor gender algorithm in house starting with the next data release. We aim to simplify processes and improve the timeliness of the data releases while maintaining data quality. Our new method outperforms the old method in terms of attribution rate based on a comparison of a sample week of quarter of data by 4%. In summary, the inclusion of gender attribution in the PatentsView internal data pipeline will ultimately result in faster and more accurate gender information for researchers, economists, students, inventors, and other users.
In pursuit of a faster and more efficient data processing pipeline that does not deter the current quality of PatentsView data, the data team also invested in weekly parsing of the raw XML data files from the United States Patent and Trademark Office (USPTO). Incremental conversion of the XML data into tsv format allows the data team to catch errors in the process before they lead to data quality issues or impede the disambiguation and attribution data processes further along the pipeline.
Here's to diving into 2022 annual data and beginning our exploration with 2023!