The PatentsView team is always working to make our data more complete, more accurate, and more useful. Recently, we completed a validation process of our assignee disambiguation algorithm. We created large, updated, ground truth dataset to calculate a current view of the performance of our assignee disambiguation algorithms.
This blog highlights our continued commitment to assignee disambiguation validation by detailing the process we went through to build a hand-labeled ground truth dataset, the python packages and metrics used in assignee validation, and by openly disclosing results and their statistical significance.
Why is evaluation important?
Model evaluation is important because it allows us to be publicly transparent about the performance of our disambiguated data, identify situations where our model does not perform as well as it could, and to make changes to improve overall performance. We validate our assignee disambiguation algorithm by estimating key performance metrics, like precision and recall, that summarize its accuracy.
Assignee Labeling Process
Estimating the performance of our disambiguation algorithm requires benchmarking data: some “ground truth” against which we can compare our predictions and assess the quality of our disambiguation. There are two main types of ground truth data used to evaluate entity resolution algorithms.
The first type is partially resolved entities, which consists of examples of matching entity mentions (e.g., an example of assignees from two patents, like “Microsoft” and “Microsoft Corp.” that we know refer to the same organization), as well as examples of non-matching entities (e.g., “Microcorp” and “Microsoft”), which upon research of company location and services offered, we know are two separate companies.
The second type of ground truth data is fully resolved entities. In this case, we find all patents for which Microsoft is an assignee and use that as complete ground truth for evaluation. We demonstrate how we cluster entities, such as all instances of “Microsoft,” to create our ground truth in the remaining paragraphs of this section.
Our evaluation process focuses on the second type of data, fully resolved entities, because this method provides more robust statistical outputs. We employed three data labelers for over 100 combined hours to resolve the entities of over 250 randomly selected assignee mentions. To maximize the accuracy of the ground truth labels we created, that is groupings of rawassignees that are mentions of the same organization, we broke the process down into two main parts: (a) finding everything that could be a related entity and then (b) removing unlike assignees based on greater rawassignee detail.
In step (a) finding everything that could be a related entity, for each assignee, we compared the assignee reference (organization name or first/last name) with hundreds of similar names in our database. This was done by the hand-labelers using a custom-made Streamlit application, which we designed to be both a query and data augmentation tool. Labelers pulled similar assignees by testing various potential name representations – name shortenings, abbreviations, potential misspellings, etc. – in Streamlit and saving the results. Streamlit then augmented the saved results from the labeler search by adding previous clusters (prior disambiguation result) that were associated with one or more of the rawassignees found by the labeler and were not previously included.
For step (b) removing unlike assignees based on greater rawassignee detail, hand-labelers reviewed the saved cluster output from step (a). The saved cluster data contained additional information about the rawassignee, including the associated patent, patent type, CPC classification, and location. Using filters, sorting, or any resource necessary, labelers carefully inspected all types of assignee data which could prove useful to remove any rawassignee mentions that should not be included in the final cluster. Dependent on the size of clusters, manual review could take between two minutes and an hour.
Evaluation Packages – Written for PV
PatentsView estimates performance metrics in a transparent manner using open-source entity resolution evaluation software. The functions are called in this location from our PatentsView-DB repository. We leverage two repositories; ER-Evaluation for the pairwise precision and recall metric functions and PatentsView-Evaluation to upload the relevant benchmark datasets (Binette & Reiter, 2023,1). You can find more technical details about this process in the documentation and code linked in this section.
Statistical Significance
Precision and recall are standard metrics in evaluating entity resolution and a detailed discussion about those metrics can be found in our last evaluation report (Monath, et al, 2021, 15). Based on the newest PatentsView Ground Truth labeled data, the latest data update achieved a precision of .90 and a recall of .72. This indicates that we are more likely to leave a rawassignee record[1] out of a cluster (False Negative) than erroneously include an additional record into a cluster (False Positive).
Precision and recall are calculated on an entity level evaluating the results of the most recent assignee disambiguation algorithm for the 228 assignee clusters, where we have ground truth data. See our last evaluation report for a more detailed explanation on the difference of entity-level versus rawassignee record level evaluation (Monath, 2021, 16-17). A standard deviation of around 4% for both of our estimates can be interpreted as 4% variability around the estimate – meaning that there is an approximately 68% likelihood that the true (population) precision is between 0.8604 to 0.9400, and that recall is between 0.6839 to 0.7699. This team believes that 4% variability is a narrow enough range for confidence in these evaluation metrics.
Metric
Estimate
Standard Deviation
Precision
0.900
0.040
Recall
0.727
0.043
F1
0.804
Conclusion
In conclusion, the recent advancements in PatentsView, particularly concerning the validation of assignee disambiguation algorithms, signify a steadfast commitment to data accuracy and transparency. Through meticulous evaluation processes and the creation of comprehensive ground truth datasets, we ensure quality of our disambiguated data.
By employing multiple data labelers and leveraging sophisticated evaluation packages, such as ER-Evaluation and PatentsView-Evaluation, metrics like precision and recall are estimated, shedding light on the algorithm's performance.
The latest update boasts an impressive precision of 0.90 and a recall of 0.727, indicating a high level of accuracy in entity resolution. These efforts underscore PatentsView's unwavering dedication to providing users with high-quality disambiguated assignee data and our commitment to transparency to our users in our processes and our work.
Citations
Binette, O., & Reiter, J. P. (2023). ER-Evaluation: End-to-End Evaluation of Entity Resolution Systems. The Journal of Open-Source Software. https://joss.theoj.org/papers/10.21105/joss.05619.pdf
Monath, N., Jones, C., & Madhavan, S. (2021, July). PatentsView: Disambiguating Inventors, Assignees, and Locations. Retrieved from https://s3.amazonaws.com/data.patentsview.org/documents/PatentsView_Disambiguation_Methods_Documentation.pdf
[1] PatentsView defined a “rawassignee record” as every mention of an assignee found on all granted patents and patent applications