Skip to main content
 
 
 
IN THIS SECTION
3 posts
jiangzhiyiqin
Last seen: 02/14/2022 - 17:08
Joined: 02/02/2022 - 23:16
Strang string show up in "Disambiguated inventor id" in multiple datasets

Hi PatentsView experts, 

I suppose the correct "Disambiguated inventor id" shall be in the similar form of "4341225-2". This is true for "disamb_inventor_id_xxxxxxxx" in dataset "persistent_inventor_disambig". 

 

However, strange strings show up in "Disambiguated inventor id" in many other datasets, examples are: 

fl:a_ln:bo-1
fl:a_ln:bo-2
fl:a_ln:boag-1
fl:a_ln:boag-2
fl:a_ln:boak-1
 

This is the case for the following datasets (may be more datasets that I have not checked yet): 

(1) "patent_inventor": Crosswalk between patent and inventor tables

(2) "inventor": Disambiguated inventor data

(3) "rawgender": inventor gender data

 

Hope someone could answer my question soon. 

 

Thank you so much in advance. 

 

Best Regards,

Bo Li

PVTeam
Role: moderator
Last seen: 07/01/2022 - 17:42
Joined: 10/17/2017 - 10:47
Table regenerated

Hi there,

Thanks for your comment! You are correct in that the logic for generating the disambig_inventor_id has changed over time. The ID, as shown in persistent_inventor_disambig.disambig_inventor_id_20201229, the column is now a combination of first initial, last_name, and a number indicating the number of combinations of first initial and last name. If you need to map our old version of the ID (persistent_inventor_disambig_xxxxxxxx) to the new id (disamb_inventor_id_20201229), you can use the persistent_inventor_disambig table to do the mapping. Let us know if you have other questions.

Thanks,

PV Team

jiangzhiyiqin
Last seen: 02/14/2022 - 17:08
Joined: 02/02/2022 - 23:16
thousands of strange strings show up in "Disambiguated inventor

Hi PV Team,

 

Thanks for your reply and explanation.

 

However, I am talking about wrong and misleading “disambiguation inventor id” in other datasets. I have discussed with several Ph.D. students and visiting scholars. They all have the same view as me. 

 

First, let me show you the correct “disambiguation inventor id” . Dataset "persistent_inventor_disambig" has the correct "Disambiguated inventor id": “disambig_inventor_id_20201229”. All observations have the similar form of "4341225-2". All 12 versions of "disamb_inventor_id_xxxxxxxx" have the same format as "4341225-2".

 

Second, let me show you the wrong and misleading “disambiguation inventor id” in other datasets. They are in the format of

fl:a_ln:bo-1

fl:a_ln:bo-2

fl:a_ln:boag-1

fl:a_ln:boag-2

fl:a_ln:boak-1

 

These wrong and misleading “disambiguation inventor id” appear in thousands of observations of the following variables from various datasets:

 

(1)"patent_inventor".inventor_id

The dataset is “Crosswalk between patent and inventor tables”

(2) "inventor".id:
The dataset is “Disambiguated inventor data”

(3) "rawgender".inventor_id

The dataset is “inventor gender data”

 

Please be careful when you look at above variables from datasets: from the very first observations, you could see multiple id in the right format. However, when you scroll down, you would see thousands of wrong and misleading observations in the format of “fl:a_ln:boag-2”. This observation can give you additional evidence that strings in the format “fl:a_ln:boag-2” are wrong: within the same variable in the same dataset, you can see both right format as "4341225-2" and wrong format as "fl:a_ln:boag-2".

 

If you have any question, please do not hesitate to contact me via phone or email (contact information is included in my email reply).

 

Best Regards,

Bo Li