Crosswalk between application.tsv and patent.tsv

5 posts

Wed, 03/17/2021 - 15:45

behroozkh

Last seen: 03/17/2021 - 17:50

Joined: 03/10/2021 - 11:38

Crosswalk between application.tsv and patent.tsv

Hi,

I'm new to this Forum. I have searched for this topic, but couldn't find anything related. If the question is repetitive, appologies in advance.

I am trying to select all the patents that have either an assignee or an inventor from a Nordic country (Sweden, Denmark, Norway, Finland, and Iceland).

I have used both raw data (rawlocation.tsv, rawinventor.tsv, rawassignee.tsv, patent.tsv, and application.tsv) and disambiguated data (patent_inventor.tsv, patent_assignee.tsv, location.tsv,application.tsv,and patent.tsv) to filter the data to the desired output. The number of patent_ids with Nordic assignees or inventors are 167,381 and 163,119 using disambiguated and raw data, respectively.

My problem here is that when I try to filter patent.tsv using the patent_id from the previous task, I get only 16,949 and 16,681 patents using disambiguated and raw patent_ids, respectively. But, the using the same patent_id, I can extract almost all the corresponding applications from application.tsv.

I further figured out that the column 'id' in patent.tsv (with 7,528,963 rows) overlaps with only 726,704 values in the column 'patent_id' in application.tsv (with 7,526,704), which is about just a tenth and explains why I got almost a tenth of the desired patent_id.

Is there a problem with the columns 'id' and 'number' in patent.tsv? Or, is there a crosswalk between application.tsv and patent.tsv?

I appreciate any assistance in advance.

Regards,

Behrooz

Share Your Knowledge in the Community Forum

Contact Us

Terms of Use