Hi, Team:
Thanks for maintaining this database. I downloaded the most updated assignee disambiguation sheet "assignee.tsv" and searched disambiguate assignee names like "International Business Machine" in EXCEL. Below is my output:
There are more than 40 unique assignees related to "International Business Machine". Most of these names are very similar. However, the disambiguation data treats these similar assignee names as separate assignees. For example,
ID: e5ef7094-94b9-4fd9-837e-5af55e5defb2 Assignee: International Business Machined Corporation
ID: fd403e3b-2f88-4274-8a92-2706b6dd0877 Assignee: International Business Machines Corporporation
ID: d568ef3b-dbc4-4559-b824-a3112de5faa1 Assignee: International Business Machiness Corporation
The above 3 instances should be the same company "International Business Machines" with small spelling errors. However, it seems that the algorithm treats them as different companies and assigned them with different IDs. I wonder whether I misunderstood any part of the dataset? or whether the algorithm could not capture these tiny spelling errors sometimes?
Thank you very much again for your help,