Skip to main content
2 posts
Last seen: 02/08/2024 - 06:15
Joined: 01/01/2024 - 15:16
Potential issues with pg_published_application and g_detail_desc_text

Dear PatentsView team, 

Thanks for providing this great platform. I've encountered two potential issues with data files, perhaps I am making a mistake somewhere in my analysis, but I thought it best to flag the issues here.

1. The pg_published_application file

The data dictionary states that the patent_type field in the pg_published_application file should contain values like 'utility' or 'plant'. However, as far as I can tell, the field instead contains the following values (for 2002 published applications):

new 198849
original-publication-amended  77
voluntary 55
corrected 23
republication-amended 13
original-publication-redacted 2
new-utility 1

On the other hand, a similar field for patent grants, the patent_type field in the g_patent table, contains the values that I would expect based on the data dictionary description (for 2002 patent grants): 

utility 167321
plant 1132
reissue 461
NaN 49

Are the values in the patent_type field in the pg_published_application file correct, and if so, should the data dictionary definition be updated, to avoid confusion? 

2. The g_detail_desc_text file

Also, I've noticed that the detailed patent description for patent grants in the g_detail_desc_text file are stored in a field named detail_description_text in early years (pre 2003, as far as I can tell), and in a field named description_text in later years. This naming appears to be inconsistent with the data dictionary, and it's a minor inconvenience when working programmatically with the data.

Is this change in the description_text in the g_detail_desc_text file across years correct, and if so, should the data dictionary be updated to avoid confusion? 

Thanks again for providing this great platform!

Best wishes, 





Role: moderator
Last seen: 01/29/2024 - 13:14
Joined: 10/17/2017 - 10:47
g_detail_desc_text headers and pgpubs patent_type field

Hello Rory,

Thank you for your feedback! 

To address your second observation regarding g_detail_desc_text, we downloaded fresh copies of the detail description text files for the years surrounding 2003 and a handful of others across the full range of years covered by PatentsView from our text file downloads page. Every file we tested used the column header "description_text". We would recommend re-downloading the pre-2003 files. If you still find an inconsistency in the column labeling, please let us know the specific files involved, and we'd be happy to dig deeper into the problem.

Regarding your first observation about the patent_type field in pg_published_application, we were able to confirm that the `patent_type` column contains the expected values of "utility" and "plant" for all publications published from 2005 onwards. The other set of values is confined to the data from 2001-2004. We are currently investigating the source of this discrepancy and whether a file correction will need to be made for our next data release. We will update here when we have a conclusion.

Thank you for using PatentsView and helping us provide the best quality of data!