Skip to main content

Gender Attribution of U.S. Patent Holders

Demographic information on inventors, lawyers, and other individuals who participate in the U.S. and global patent systems is very useful for policy analysis. One demographic characteristic of these individuals in their gender, defined as the person’s physical gender rather than using broader concepts of gender identity. The PatentsView data process incorporates an algorithm that predicts an individual’s gender for inventors named on U.S. published applications and patents. It builds on prior gender classification methods by using country-specific lists of names associated with male and female genders, accounting for the migratory backgrounds of inventors, and disambiguating inventor names prior to the gender attribution process to avoid duplicate entities and problems stemming from misspellings and other errors found in raw text fields. These improvements tend to make the attribution results more accurate. In the future, the PatentsView team hopes to expand the gender attribution process to incorporate other patent system participants such as lawyers.

PatentsView Sources for Gender Attribution:

  1. IBM-GNR (Global Name Recognition), a name search technology produced by IBM. IBM-GNR is a commercial product that performs various name disambiguation tasks, of which two are relevant to our methodology: (1) the association of names and surnames to one or (more often) several countries of likely origin, and (2) the association of names to male and female given in the form of probability estimates. These associations originate from a database produced by U.S. immigration authorities in the first half of the 1990s. During this time, immigration authorities registered all names and surnames alongside nationality and gender of all foreign citizens entering the United States. It contains roughly 750,000 full names; in addition, variants of registered names and surnames are considered, according to country-sensitive spelling and abbreviation rules. More information can be found in Breschi et al. (2017a[1], 2017b[2]).
  2. The WIPO worldwide gender-name dictionary (WGND), produced by the World Intellectual Property Organization (WIPO). It includes a list of 6.2 million names from 182 different countries. For each name contained in the data set, it attaches a given gender by country where that name appears in the source data. The construction of the WGND drew on previous gender studies as well as national public statistical institutions. See Martinez et al. (2016)[3] for details. Some names in certain countries are both male and female. These names are given an “unknown” status for countries where this is the case.

Ten-Step Gender Attribution Process Overview

Using these two sources of country-specific gender-attributed names, the team assigns gender to inventors’ names on USPTO published applications and patents using the following steps:

  1. For each inventor name, the IBM-GNR returns the fraction of instances it identifies as male in the data source and the fraction it identifies as female. In addition, it returns a “frequency” metric that indicates the frequency with which each name appears in the complete data set. A very uncommon name will be assigned a very low frequency, indicating that gender attribution will be unreliable for that name.

  2. For each inventor first name, female gender is attributed if it is identified as female in 97% or more cases and male gender is attributed if identified as male in 98% or more cases. These threshold values were decided by manual inspection of the distribution of the fraction of gender appearance from the first step. However, any names with a frequency metric of 5% or less are excluded due to unreliability of attribution. 

  3. When the inventor’s first name is majority one gender but does not reach the thresholds established in step 2, the second (or middle) name is taken into consideration. When the second name does reach the threshold value, the appropriate gender is attributed to the inventor.

  4. Next, for names that do not have an attributed gender, the WIPO’s WGND is used. Due to the WGND being country-specific, we must first attribute a country of origin to each inventor remaining using the IBM-GNR.

  5. For each likely country of origin, the GNR attaches a measure of significance, which measures the share of instances in which the name or surname is associated with a given country of origin. The present algorithm focuses on the vector of countries associated with the surname to assign a country of origin. This is because the first name is the name of interest for gender attribution and thus cannot be part of the decision rule for country of origin.

  6. Of the set of associated countries for each surname, only those with at least 10% significance are considered. After dropping those under 10%, the list is sorted by significance in descending order. This step does encounter another problem, which can be best explained through the example of the “Smith” surname. It could be given a 30% significance for Germany, 20% for the United Kingdom, and 10% for Ireland and Australia. In principle, this would be associated with Germany, but the other Anglo-Saxon countries add up to a higher level of significance than Germany. To address this problem, some countries are collapsed into linguistic groups to create a list of countries and languages associated with surnames. These linguistic groups are sorted in the larger list of countries as one, and then sorted further within the group afterwards.

  7. After sorting, each inventor is associated with each surname based on the list of linguistic groups and individual countries. The “Smith” example from step 6 would first be associated with the United Kingdom, Ireland, the United States, Australia, Canada, and so on (all English-speaking countries), and then to Germany, Switzerland, and Austria.

  8. With linguistic groups and countries associated with each surname, the first name and at least one of the associated countries are matched to name-country pairs in the WGND data set. More than one linguistic group is kept per inventor because, for some name-country pairs, the first linguistic group does not exist in the WGND data set. In those cases, the most significant linguistic group included in the data set is used.

  9. For some inventors with rare surnames, we were not able to create a list of likely countries of origin. In these cases, country of residence is substituted for country of origin.

  10. Last, the cases of no name-country match in the WGND process are addressed. We use the WGND gender attribution despite no name-country match, and attribute gender only if two conditions are satisfied: (1) all instances in the WGND agree on that gender and (2) the majority of instances generated by GNR coincide with the gender attributed by the WGND.

The United States has a much higher attribution rate than countries such as China, India, and the Republic of Korea. This problem is shared by prior studies with a similar aim that have attempted to attribute gender to Asian names. Therefore, some additional steps were implemented to create a “baseline-augmented” method. Thresholds for these steps were all set by manual inspection of the distribution of GNR shares for each group/country.

  • For surnames primarily associated with China, Singapore, Taiwan, Macao, and Hong Kong, we attribute a gender if it is identified in 60% or more of GNR cases.

  • For surnames primarily associated with the Republic of Korea, the threshold is set at 80%.

  • For surnames primarily associated with India, the threshold is set at 90%.

Methodology Documentation

A technical report of the gender attribution process is available to view here: PROGRESS AND POTENTIAL: A profile of women inventors on U.S. patents. Appendices to this report are located here: On-line Appendices to "Progress and Potential: A profile of women inventors on U.S. patents."

1 Breschi, S., Lissoni, F., Miguelez, E., 2017a. Foreign-origin inventors in the USA: testing for diaspora and brain gain effects. J Econ Geogr 17, 1009–1038. https://doi.org/10.1093/jeg/lbw044

2 Breschi, S., Lissoni, F., Tarasconi, G., 2017b. Inventor Data for Research on Migration & Innovation: The Ethnic-Inv Pilot Database., in: In: FINK, C. & MIGUELEZ, E. (Eds.) The International Mobility of Talent and Innovation: New Evidence and Policy Implications. Cambridge University Press.

3 Martínez, G.L., Raffo, J., Saito, K., 2016. Identifying the Gender of PCT inventors (No. 33), WIPO Economic Research Working Papers. World Intellectual Property Organization - Economics and Statistics Division.