Duplicated application id

Dear Patentsview Team,

I hope this message finds you well. I have three questions about pregrant data.

1) I found there are duplicated application id in the pregrant published application data (pg_published_application)  but shouldn't the application ids uniquely identify each application?

2) There are duplicated application id in pg_granted_patent_crosswalk too which means there are cases where one application id corresponds to multiple patent id. Why is this the case and how am I supposed to deal with this? I only want to keep utility patents.

3) In these two dataset, pgpub_id and application_id are not one-to-one or one-to-multiple. But aren't they supposed to be? Why is this the case in the data?

Attached are two snapshots for your reference. Thank you very much!



Response to duplicate id questions

Lumen, thanks for your question. Here are our responses: 

  1. There can more several entries for each application_id in our pg_published_application when there are revisions or amendments to the original application as you can see in the patent_type field. In the case of a new patent and then a record for a utility patent for the same application; these cases indicate that a provisional patent was applied for first.

  2. In the case of the duplicated application IDs, this is something that happen’s on USPTO’s side. We are just happy ingesters of the data they publish in the xml format. For example, in the case of application_id 15103777, there are two different records that can be found in the USPTO patent/application search on USPTO’s website:  for the IDs 20160320502 & 20190072683. Although the titles and authors remain the same between the two publications, the abstract changes.

  3. I understand the nuances can be frustrating with these datasources. Our recommendation is to filter the pg_published_application data to patent_type = ‘utility’ and in the case where two duplicates exist, use the record with a later “published_date”. You can do a similar aggregation of the crosswalk table by only pulling in the latest pgpub_id for each application. These recommendations depend on what question you are trying to answer. 

We hope this information helps!