Hi,
I am working with the assignee file and I am noticing that some organizations are listed 2+ times. For example:
2anza8ga48wsus63rjst1ifmw 3 NULL NULL Infineon Technologies AG
5ysm5zlcaf2b02d62mgic7512 3 NULL NULL Infineon Technologies AG
org_61gyUoVVQyeF60uJoBif 3 NULL NULL Infineon Technologies AG
pn5la4wmxkdt5gof0ubafymhq 3 NULL NULL Infineon Technologies AG
Isn't the purpose of the disambiguation process to combine these into a single entity id?
Also I am working with Natural Language Processing and a term of the art used in Lemma which is the base root and allows the combinations for different word use such as plurals to grouped to a single entity. Perhaps that could be considered as in the sample shown below: Note the plural in Americas in the second line. In reality these are probably the same business entity.
org_0T3DUOVT6gX9RCesn7iE 2 NULL NULL Infineon Technologies America Corp.
org_KvWrsyblXUCdRpJcqGns 2 NULL NULL Infineon Technologies Americas Corp.
Andy