Gender Attribution of U.S. Patent Holders
Demographic information on inventors, lawyers, and other individuals who participate in the U.S. and global patent systems is highly valuable for social research and policy analysis. One demographic attribute that the PatentsView team can predict based on available data is gender. Our algorithm predicts the most likely gender of inventors named on U.S. published applications and patents.
To accomplish this, we utilize the gender-it package developed by the World Intellectual Property Organization (WIPO), which draws on data from their Worldwide Gender-Name Dictionary 2.0. This method relies on country-specific lists of approximate probabilities for each gender for names within that country. For example, in the United States, a person named “Charlie” has a 90% probability of being male and a 10% chance of being female.
It's important to note that, due to the current availability of data and existing conventions for attribution algorithms, our methodology operates within a binary gender paradigm, classifying individuals as either male or female. Achieving a more nuanced representation of gender among patent inventors would require inventors to voluntarily disclose their gender, which is presently not included in the patent application and publication process. In the future, the PatentsView team aims to expand the gender attribution process to include other participants in the patent system, such as lawyers.
PatentsView Sources for Gender Attribution:
The PatentsView team relies on The Worldwide Gender-Name Dictionary (WGND) 2.0, produced by WIPO to attribute gender. The WGND 2.0 expands upon its previous version, significantly increasing its coverage worldwide. It encompasses more than 26 million records that establish connections between given names and 195 countries and territories.
- The WGND 2.0 provides the proportion of individuals in each country with a given name who are male or female. In cases where records exist for a name within a country but lack an associated gender marker, a proportion labeled as "unknown" is also provided.
Multiple derived datasets based on the country-specific records are also available from WIPO. PatentsView uses two of these:
- Language group agreement: each country has an associated language group. If all countries within a language group that include data for a name report that a majority of individuals with that name have the same gender, that majority gender is recorded. Otherwise, that name is excluded for that language group.
- Global agreement: If all countries in the WGND dataset that include data for a name report that a majority of individuals with that name have the same gender, that majority gender is recorded. Otherwise, that name is excluded.
The construction of both editions of the WGND and their associated data sets drew on previous gender studies as well as national public statistical institutions.
For further details, see the 2021 publication by Martinez et al.
Gender Attribution Process Overview
Using these country-specific gender-attributed names, the team assigns gender first to inventors’ names as they appear on individual USPTO published applications and patents, and then to using the following steps:
Checking the Name in the Inventor's Country
- The algorithm initially searches for the inventor's given name(s) in the records specific to their country of residence.
- Matching records provide a "weight" for male and female labels based on the proportion of individuals with that name and gender in the country.
- If the resulting weight meets or exceeds the PatentsView confidence threshold for that country (refer to the note on confidence thresholds below), the inventor is attributed to the corresponding gender.
- If the inventor has multiple given names and conflicting attributions, the earliest attributable name is given preference.
- If data is available for the inventor's name(s) within their country of residence but no names reach the confidence threshold, the name is considered ambiguous, and the inventor is not attributed a gender.
Checking the Name in the Country’s Language Group
- If data is unavailable for an inventor’s given name(s) within their country of residence, the algorithm checks for the name(s) within the language group for that country.
- If a record is present, meaning that all countries in that language group share the same majority gender for individuals with that name, that gender is assigned to that inventor.
Checking the Name Among Globally Agreed Names
- Any names that were not assigned based on data for the inventor’s country of residence or its language group are checked against the set of names for which every country in the WGND that includes that name reports the same majority gender.
- If a record is present, meaning that all countries that have records of that name share the same majority gender for individuals with that name, that gender is assigned to that inventor.
Ambiguity and Non-Attribution
- Any names that were not able to be attributed based on the inventor’s country, its language group, or global agreement are considered ambiguous and are not assigned a gender.
Disambiguation and Resolution of Conflicting Records
- After individual inventor records are disambiguated, conflicts in gender attribution between any records associated with the same individual are resolved by assigning the disambiguated inventor the gender associated with the majority of records.
- If an equal number of records associated with the same inventor are attributed as male and female, the disambiguated inventor is not attributed a gender.
The United States has a much higher attribution rate than countries such as China, India, and the Republic of Korea. This problem is shared by prior studies with a similar aim that have attempted to attribute gender to Asian names. Therefore, some additional steps were implemented to create a “baseline-augmented” method. Thresholds for these steps were all set by manual inspection of the distribution of WGND shares for each group/country.
For Inventors based in China, Singapore, Taiwan, Macao, and Hong Kong, we attribute a gender if it is identified in 60% or more of WGND cases.
For inventors based in the Republic of Korea, the threshold is set at 80%.
For inventors based in India, the threshold is set at 90%.
All other countries use a default threshold of 97%.
Archival PatentsView Gender Data
A technical report of PatentsView’s previous gender attribution process is available to view here: PROGRESS AND POTENTIAL: A profile of women inventors on U.S. patents.
Appendices to this report are located here: On-line Appendices to "Progress and Potential: A profile of women inventors on U.S. patents."
The gender attributions used for this report can also be downloaded.
Sources and citations
- WGND 1.0 paper (2016) https://www.wipo.int/publications/en/details.jsp?id=4554&plang=EN
- WIPO WGND 2.0 data: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/MSEGSJ
- WIPO genderit github: https://github.com/IES-platform/r4r_gender/tree/main/genderit/python
- PV genderit fork: https://github.com/PatentsView/gender_it/tree/main
- Progress and Potential report: https://www.uspto.gov/sites/default/files/documents/OCE-DH-Progress-Potential-2020.pdf
- Progress and Potential appendices: https://s3.amazonaws.com/data.patentsview.org/documents/On-line+Appendix+-+Gender+Attribution+of+USPTO+Inventors.pdf