Dear PatentsView team,
Thanks for providing this great platform. I've encountered two potential issues with data files, perhaps I am making a mistake somewhere in my analysis, but I thought it best to flag the issues here.
1. The pg_published_application file
The data dictionary states that the patent_type field in the pg_published_application file should contain values like 'utility' or 'plant'. However, as far as I can tell, the field instead contains the following values (for 2002 published applications):
new | 198849 |
original-publication-amended | 77 |
voluntary | 55 |
corrected | 23 |
republication-amended | 13 |
original-publication-redacted | 2 |
new-utility | 1 |
On the other hand, a similar field for patent grants, the patent_type field in the g_patent table, contains the values that I would expect based on the data dictionary description (for 2002 patent grants):
utility | 167321 |
plant | 1132 |
reissue | 461 |
NaN | 49 |
Are the values in the patent_type field in the pg_published_application file correct, and if so, should the data dictionary definition be updated, to avoid confusion?
2. The g_detail_desc_text file
Also, I've noticed that the detailed patent description for patent grants in the g_detail_desc_text file are stored in a field named detail_description_text in early years (pre 2003, as far as I can tell), and in a field named description_text in later years. This naming appears to be inconsistent with the data dictionary, and it's a minor inconvenience when working programmatically with the data.
Is this change in the description_text in the g_detail_desc_text file across years correct, and if so, should the data dictionary be updated to avoid confusion?
Thanks again for providing this great platform!
Best wishes,
Rory