Question on assignee disambiguation

Hi, Team:

Thanks for maintaining this database. I downloaded the most updated assignee disambiguation sheet "assignee.tsv" and searched disambiguate assignee names like "International Business Machine" in EXCEL. Below is my output:


IBM Assignees

There are more than 40 unique assignees related to "International Business Machine". Most of these names are very similar. However, the disambiguation data treats these similar assignee names as separate assignees. For example, 

ID: e5ef7094-94b9-4fd9-837e-5af55e5defb2        Assignee: International Business Machined Corporation

ID: fd403e3b-2f88-4274-8a92-2706b6dd0877    Assignee: International Business Machines Corporporation

ID: d568ef3b-dbc4-4559-b824-a3112de5faa1       Assignee: International Business Machiness Corporation


The above 3 instances should be the same company "International Business Machines" with small spelling errors. However, it seems that the algorithm treats them as different companies and assigned them with different IDs. I wonder whether I misunderstood any part of the dataset? or whether the algorithm could not capture these tiny spelling errors sometimes?


Thank you very much again for your help,





Assignee Disambiguation

Hello QW

The PV team is constantly working to improve our disambiguation algorithm, and we aim to decrease the number of redundant IDs like those above that evade disambiguation during our quarterly update cycles. We expect our next annual update to release by early 2022.