Skip to main content
 
 
 
IN THIS SECTION
ex. data visualization, research paper
  • How Can We Apply Skill Relatedness Networks to Innovation?

    By Siddharth Engineer

    A skill relatedness network is an interconnected system which shows similarities between industries.

    Imagine there are many employees who transition from industry A to industry B. This would suggest that the two industries require similar skillsets. A skill-relatedness network provides a broad view of such labor flows to better understand the similarities between fields.

    This can be valuable information to economists, firms seeking to leverage human capital, and people seeking employment opportunities. Let us look at one example a little bit more closely. Labor mobility, referring to a worker’s ability to move between jobs and industries, is critical in the personal/financial growth of workers. This can lead to reductions in poverty and an overall stronger economy.

    Transportation Limits Worker Mobility in Columbia

    In Colombia, an analysis of transportation systems revealed that commute times were significantly limiting the ability of firms to make use of a diverse pool of skills.

    When employers in similar industries are grouped geographically, this limits labor mobility because workers with limited transportation options cannot move between industries. Instead, we can map skill-relatedness networks to geographic regions to capture the employment opportunities that sector classifications would otherwise overlook (O’Clery et al. 2019).

    Below: an example skill relatedness network for labor markets

    Visualisation of the skill-relatedness network for Colombia, where nodes correspond to industries and edges correspond to positive values of the adjacency matrix given in Eq. 2. The node size is proportional to industry complexity, and colours correspond to the sector groups given in the legend

    Looking at Skill Relatedness Networks Differently

    More recently, we have been able to apply skill-relatedness networks to innovation. Let us adapt our prior definition of skill-relatedness. Instead of focusing on employees who change work, let us look at inventors who change fields. At the end of the day, both employment and patents are applications of an individual’s skill. By identifying inventors with patents in multiple fields, we can get a better picture of the human capital available for innovation specifically.

    Using PatentsView's disambiguated inventor data, we can mathematically define this new skill-relatedness network. Imagine transition matrices (F) between technologies of dimensions N x N where N represents the total number of technologies. Each element Fi,j = 1 if an inventor transitioned from technology i to technology j.

    A Case Study

    Sergio Palomeque constructed a skill-relatedness network by aggregating these matrices, comparing it to a null model, and normalizing the data. The results revealed that the diameter of the network has decreased over time, particularly in the last 10 years.

    A decreasing diameter indicates more links between existing technologies than new ones are being introduced. While the reasons for this trend are still unclear, further research in skill-related networks could offer valuable insights into innovation, as demonstrated in the context of transportation in Colombia.

  • What's New with PatentsView - June 2023

    June Updates 

    This month in PatentsView news, the data team will release quarter four data for 2022 and the quarter one data for 2023. The disambiguated and processed data will include patents and published pre-grant patent applications from September 30, 2022, to March 30, 2023. In addition to bulk downloadable data for granted patents and pre-grant application publications, the legacy API, PatentsView's new PatentSearch API, and site visualizations will also be updated with data through March 30, 2023. To celebrate the completion of processing for the year 2022, we're lighting sparklers just in time for the independence and Emancipation Day celebrations in the United States!

    In our previous data updates, PatentsView gender data was attributed through a partnership with faculty at the University of Bordeaux. Starting from the final quarter of 2022 up to the present, our PatentsView data scientists have attributed gender to inventors using World Intellectual Property’s (WIPO’s) Genderit Method algorithm, which has been adjusted by our team. The new attribution method has been applied to all historic records and assigned to disambiguated inventors based on the majority gender of raw inventor records that combine to make the disambiguated inventor. For instance, if over 50% of raw records for a given inventor are marked female, then the inventor is attributed as female. In cases where exactly 50% of raw inventor records are marked as both female and male (which did occur), the gender remains unattributed.

    PatentsView has brought the inventor gender algorithm in house starting with the next data release. We aim to simplify processes and improve the timeliness of the data releases while maintaining data quality. Our new method outperforms the old method in terms of attribution rate based on a comparison of a sample week of quarter of data by 4%. In summary, the inclusion of gender attribution in the PatentsView internal data pipeline will ultimately result in faster and more accurate gender information for researchers, economists, students, inventors, and other users.

    Looking Ahead

    In pursuit of a faster and more efficient data processing pipeline that does not deter the current quality of PatentsView data, the data team also invested in weekly parsing of the raw XML data files from the United States Patent and Trademark Office (USPTO). Incremental conversion of the XML data into tsv format allows the data team to catch errors in the process before they lead to data quality issues or impede the disambiguation and attribution data processes further along the pipeline.

    Here's to diving into 2022 annual data and beginning our exploration with 2023!

  • Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org

    The U.S. Patents and Trademarks Office receives thousands of patent applications every year. Often, the same inventor will apply for multiple patents. Other times, multiple inventors with similar names will each apply for a patent.  

    The issue researchers and innovation enthusiasts have run into is that, when analyzing patent data, there is no standard way to tell whether an inventor named on multiple patents is the same person or different people with a similar name. 

    PatentsView uses algorithms to make that determination, a process known as entity resolution or disambiguation. The process is not perfect, and the PatentsView team is constantly working to make the algorithm more accurate.  

    The first step in any improvement process is to evaluate how well the current system works. Olivier Binette, a PhD candidate in Statistical Science at Duke University, explored this question in his publication Estimating the Performance of Entity Resolution Algorithms: Lessons Learned Through PatentsView.org.  

    Challenges for the PatentsView algorithm 

    Binette notes in his paper that the PatentsView entity resolution algorithm faces three main challenges in accurately determining whether the names on multiple patent applications belong to one or more than one inventor. 

    First, when researchers apply the PatentsView algorithm to benchmark datasets — smaller subsets of larger datasets that are used to train and test algorithms — the results tend to be more accurate then when the algorithm is applied to the larger, real-world data. This is likely because many of the false links between inventors with similar names do not appear in the benchmark dataset. 

    Second, the number of patents that share a common inventor is relatively small compared to the larger number of patents. This creates a challenge for training the PatentsView algorithm to classify pairs of records as either sharing an inventor or not sharing an inventor. 

    Finally, there are many different methods researchers have used to sample the benchmark data sets and adjust their estimates according to those samples. This creates an additional challenge in training the PatentsView algorithm. 

    Binette’s method 

    Binette argues that his method for estimating the performance of the PatentsView algorithm addresses all three challenges.  

    His method uses three different representations of precision and recall. Precision is the fraction of pairs that are put into the same group for analysis and recall is the fraction of pairs that are correctly identified. So, an algorithm with high precision would correctly identify two similar names and put them together for analysis most of the time. An algorithm with high recall would, most of the time, correctly identify which of those similar names belonged to the same inventor. 

    He tested each representation using PatentsView’s current disambiguated inventor data. For the test, he treated that data as the ground truth, then randomly added in errors before calculating precision and recall.  

    He repeated the process 100 times. Then, he performed additional tests on two existing benchmark datasets and a disambiguation set done by hand. 

    Using this method, Binette found that the PatentsView’s inventor disambiguation algorithm had a precision between 79%-91% and a recall between 91%-95%, which is much lower than the 100% found by previous testing on benchmark datasets. This shows that PatentsView’s current entity resolution algorithm over-estimates matching pairs.  

    Future uses 

    Binette’s evaluation method gives PatentsView a way to reliably analyze the effectiveness of changes made to the entity resolution algorithm in the future. Dive deeper into Binette’s method and review his code on his PatentsView Evaluation page on Github

  • Data-in-Action Spotlight: Can natural disasters affect innovation? Evidence from Hurricane Katrina

    As climate events and changes increase globally, how could this affect innovation and patenting of intellectual property? Luis Ballesteros of the Questrom School of Business at Boston University explored this question with his research on Hurricane Katrina published in late 2021.

    A different perspective

    While there are geographical studies of innovation and patents that focus on social features like how close a person is to human and material resources and institutions, Ballesteros is interested in a different perspective – what he calls, “exposure to large shocks.” Ballesteros used PatentsView’s disambiguated inventor and location data to write Can natural disasters affect innovation? Evidence from Hurricane Katrina. The publication describes the effects of natural disasters on patents and patenting.

    How Hurricane Katrina affected innovation

    Evidence suggests that large societal shocks produce lasting variations in human risk-aversion behaviors. Based on that evidence, Ballesteros proposes that Hurricane Katrina in the U.S. would have changed innovation outcomes.

    More specifically, Ballesteros and supporting literature suggest that after an immediate shock, affected counties have much more patenting activity and the quality of innovation increases compared to non-exposed counties. This correlation has been shown to persist for roughly 10 years after the initial shock.

    Methodology

    Ballesteros’ methods involved constructing a history of inventors between 1999 and 2015 that allowed him to follow the “Katrina effect” across geographies. The estimates he found imply that shock-affected people were not only more likely to patent, but became more skewed toward high-technology sectors.

    Ballesteros controlled for natural variation versus shock-related variance in several ways, which he illustrated in section four, Empirical Strategy, of the publication. In section three, Data, Ballesteros provides insights on the challenges and nuance of working with patent data, including the consideration for average processing time between application date and granting of a patent (which was reported as 23 months on average by USPTO in 2021) and how this relates to conducting longitudinal research with patent data.

    Read the full publication to learn more about Ballesteros’ methods and insights on working with patent and PatentsView data.

    How are you using PatentsView data?

    If you have used PatentsView data in your own research, organization, or classroom and would like to be highlighted in a Data-in-Action spotlight piece, please visit our service desk.

     

    Citation for Luis Ballesteros work: Ballesteros, Luis, Can natural disasters affect innovation? Evidence from Hurricane Katrina (December 13, 2021). Available at SSRN: https://ssrn.com/abstract=3980107 or http://dx.doi.org/10.2139/ssrn.3980107

  • What's New with PatentsView - March 2023

    March Updates

    This month, PatentsView released the third quarter of 2021 data complete with the new algorithm and data structure updates initiated last fall. The release notes web page holds detailed information on this release and historical releases.

    Also released this month are annualized gender data files with new documentation and an updated data dictionary from the Office of the Chief Economist (OCE) at the United States Patent and Trademark Office (USPTO). These datasets are designed for use in quick exploratory data analysis as well as read programmatically for more longitudinally focused data users. The annual files contain information from the assignee, inventor, location, application, and patent tables all in one place for a more comprehensive picture of patenting teams. In addition to pulling in variables from these separate PatentsView data tables, the datasets contain novel variables including the total number of inventors on a given patent, the total number of inventors listed on a given patent that were assigned a gender, the number of men inventors on each patent, the number of women inventors on each patent, and a flag for demonstrating whether inventor information is available for that patent.

    To read more about these data files and the inspiration for their generation, visit the Gender & Innovation page and navigate to the DATA section located under the interactive visualization of gender data from 2000 to 2020.

    Looking Ahead

    The next PatentsView data update is gearing up this March and will result in a double release of 2021 quarter four and 2022 quarter one data come early-summer. The team is working with OCE this spring to improve and optimize the assignee disambiguation and gender attribution algorithms. The anticipated result of this dive into algorithm repair and improvement is higher quality data. As always, please reach out to our team with data questions and suggestions. Your exploration of the data and reporting of discrepancies and errors helps support our team to return the highest quality data to the public.

    To receive regular updates on what the PatentsView team is working on in distributing patent data and reading about patenting literature, subscribe to our bi-monthly newsletter. Happy Spring!

     

Button sidebar