The Vision

The vision at the heart of the Institute for Research on Innovation and Science (IRIS) is to build and maintain large-scale data to enable researchers to transform the study of science and innovation.


The Need for Better Data

Much of the research on the science of science and innovation is based on hand curated and/or one-off data efforts. In this environment, researchers often do not have incentives to document and distribute their data for use by other researchers. Even when the researchers who have produced data are willing and contractually able to share them, the underlying data designed to answer the original investigators’ questions are often ill-suited to addressing other questions.


The IRIS Data Architecture & Solution

Our data architecture combines naturally-occurring data from research grant inputs with scientific outputs including publications, citations, dissertations, and patents, as well as with biographic data on researchers scraped from the web and in databases. These data integrate with STAR METRICS administrative data on grant purchases and employment, which can in turn be linked to Longitudinal Employer-Household Dynamics (LEHD) Census data enabling individuals to be traced as they move across employers and start businesses. These data are then linked using cutting edge disambiguation/name-entity resolution, web scraping and entity extraction. This IRIS methodology is advancing the underlying computational sciences and creating more useful data for broader applications.


Applications Centered Around Researchers

The core outcome of interest for science funders is the creation, transmission and adoption of scientific ideas. Our underlying framework places at its center the individual researchers, who are embedded within both scientific and economic social networks.  Science funding works in part by enabling those networks to exist and expand. This means that the framework must be centered on individual researchers as well as the networks in which they operate. The data platform must be organized in such a way as to provide dynamic links between funding “interventions”, (i.e. WHO is funded by WHOM to do WHAT) and the size, structural composition, stability and duration of research networks (WHO is funded). This in turn will be linked to the way in which ideas are created and transmitted — hence generating scientific, social, economic and workforce ‘products’ (the RESULTS of funding).

Potential Applications of the IRIS Data Could Include:

  • Relating research outputs to research inputs, including researcher characteristics, the composition of the teams that actually conduct research as measured by the amount of time people allocate to individual research projects, and the equipment inputs they employ.
  • Relating the environments in which graduate students and postdocs train to their outcomes, such as publications, citations, patents, subsequent grants, sector of employment, and earnings.
  • Using textual analysis to identify novel combinations of ideas and trace their utilization within scholarly literature and on to downstream results such as patents and drug approvals.

Accessing the Data

Our goal is to enable the research community to access data as easily as possible, subject to privacy and confidentiality restrictions. Thus, public elements (e.g. grants, publications, patents) are made available. More sensitive elements are subject to security provisions. Links to Census data are only accessible through Census Research Data Centers. All of our work is built around common identifiers so that researchers can add elements to the data, with the expectation that linked elements will be accessible later for future researchers.


IRIS researchers will post working papers to repositories.

