I am seeking clarification on an inconsistency I have encountered within the bulk download datasets.
According to the PatentsView Data Logic Diagram, the ‘pg’ data comprises ‘pre-granted data only.’ This implies that once a patent application is granted, it should no longer be part of the Pre-Grant Datasets, such as ‘pg_published_application.’
However, my analysis reveals instances where applications listed in ‘pg_published_application’ have indeed been granted, as evident when cross-referenced with the Grant Dataset ‘g_patent.’ Surprisingly, these records continue to feature in the ‘pg_published_application’ dataset, contrary to expectations.
A plausible explanation could be that granted patents are intentionally retained in the Pre-Grant Dataset. This assumption aligns with the statement in the Pre-Grant Data Download Dictionary, which states
The pre-grant publications data includes all publications released by the USPTO for download from 2001 through the most recent data update.’ Despite this, I’ve noted the absence of numerous granted patents from ‘pg_published_application,
suggesting that some patents are indeed removed from the Pre-Grant Data after being granted.
For your convenience, I have attached a file detailing the patent_ids in question, indicating whether each ID appears in ‘pg_published_application’ or ‘g_patent.’
I am at a loss to reconcile this inconsistency and would greatly appreciate any clarification or additional information that PatentsView could provide on this matter. Ensuring a consistent and accurate understanding of the datasets is crucial for the research community, and an update or clarification in the dictionary would be immensely beneficial.
Thank you very much.