IRIS Research Assistant Professor Jinsoek Kim was one of ten researchers awarded a 2022 Propelling Original Data Science grant from the Michigan Institute for Data Science.

Jinseok’s project is titled “AI-based author entity disambiguation for promoting fair evaluation of women in science,” and will develop a machine-learning method to more accurately account for woman authors’ names in published papers.

Using bibliographic data, studies have reported that female scholars tend to produce fewer papers and attract fewer citations than male scholars, indicating that women in science underperform in terms of scholarly productivity and impact. I argue that such findings are likely based on flawed data in which female authors are not properly identified. Specifically, female scholars may have changed their last names after marriage and have used the changed names in publications instead of their maiden names used in publications authored before marriage. As none of existing bibliographic data services consolidates author entities with different last names, entities of female authors who change names are inevitably split into different entities – one with a maiden name and the other with a marital name. This means that publications and citations of female authors who have used different names are likely undercounted, possibly leading to under-evaluation of their scholarly productivity and impact. This issue can hinder fair evaluation of women in science as female researchers are increasing in number while small fraction of women is found to retain their maiden names. To address the issue, this project will develop a machine learning method to consolidate female author entities in bibliographic data, thus promoting fair evaluation of women in science. Under the PODS grant, first, PI will create large-scale labeled data to train algorithmic models to merge the same female author entities split under different names. Then, PI will implement the models on author entities recorded in PubMed which indexes research papers in biomedicine, and demonstrate how the correct identification of name-changed female authors can lead us to different understanding of research productivity and citation-based impact of female scholars in the field where almost half of scientists are estimated to be female (> Analytics Pillar). Based on this case study and the algorithmic method, PI will apply for grants from funders such as the NSF to expand the PODS project into a large-scale, cross-field project. The findings derived from this project will enable science community and policy makers to correctly characterize the research productivity and impact of female scholars and to implement effective supports and policies to promote fairness and equity for women in science. A tool that implements the newly developed method will be shared under the UM license for reuse, validation, and improvement with AI researchers.