Integrative Data Analysis and Big Data
Integrative data analysis (IDA) refers to a set of strategies in which two or more independent data sets are pooled or combined into one and then statistically analyzed. IDA approaches differ from and offer advantages over other methodological techniques that also strive to build cumulative knowledge bases, such as meta-analysis.
In meta-analysis, summary statistics across multiple studies are pooled together. Because IDA techniques pool original raw data, there is no loss of individual information as found within meta-analytic approaches, which allows researchers to find out what works, for whom, and in which contexts. In addition, the use of IDA affords expanded inquiry within many areas of health behavior research. IDA can be used to incorporate big data that were not originally intended for the examination of theoretically relevant measures. For example, searches on Google for health-related topics could be used as an objective measure of information seeking that could supplement what is gleaned from a self-report data source such as the Health Information National Trends Survey (HINTS).
Data integration typically takes one of two forms:
- merging data by common data elements (units of information that are shared or widely used across data collection efforts.), where these elements are often multi-item scales or indices but can be individual items; or
- linking data sets through a common factor at the record level (e.g., linking across data through demographic information) such as that seen in the Surveillance, Epidemiology, and End Results (SEER)-Medicare data set, or at multiple levels such as the environmental or policy level (e.g., linking state- or county-level information with individual-level data).
The Behavioral Research Program seeks to promote the use of IDA to answer novel cancer control questions to accelerate scientific discovery.
Big Data is a term that captures the opportunities and challenges involved with accessing, managing, analyzing, and integrating information within diverse data sets that are increasingly larger, more diverse, and more complex. These data sets currently exceed the abilities of traditional data management approaches. The value of data from behavioral measures can be significantly amplified by aggregating or integrating them with other data. Adapted from: https://datascience.nih.gov/bd2k/about/what.
The program is invested in the improvement of the scientific rigor with which health behavior theories are tested and applied. BRP encourages and supports the use of new data sources and methods for theory testing. Information about behavior and its influences from both prospective and archival collection methods is increasingly more temporally dense and big (i.e., high in volume, variety, and velocity). These Big Data require advanced analytic approaches, greater access, and more opportunity for training and collaboration.
In September 2013, the program developed the Big D.A.T.A. (Data and Theory Advancement) workshop to complement the NIH Big Data to Knowledge (BD2K) effort.
The Big D.A.T.A. initiative convened experts in data analytics, systems science, and theory development and testing in order to address a fundamental question:
“How can behavioral scientists contribute to and leverage Big Data to advance health behavior theory in the context of cancer risk reduction and improved disease outcomes?” Robust data sets and accompanying models of dynamical systems present opportunities to substantively test, refine, and improve health behavior theories. The goal of the Big D.A.T.A. initiative is to stimulate new directions in theory development, testing, and integration with the use of Big Data, dynamic systems modeling, and novel measurement advances.
September 2013 Big D.A.T.A. Workshop Executive Summary (PDF)
September 2013 Big D.A.T.A. Workshop Executive Presentations
- Health Behavior Theory: Opportunities and Challenges
Alex Rothman, University of Minnesota (PDF)
Jasmin Tiro, University of Texas Southwestern (PDF)
Bob Evans, Google
- Systems Modeling: Opportunities and Challenges
Ross Hammond, The Brookings Institute
Daniel Rivera, Arizona State University (PDF)
Stephen Intille, Northeastern University
- Applying Dynamic Systems Modeling to Time-Intensive Data
Daniel Rivera, Arizona State University, and Genevieve Dunton, University of Southern California (PDF)
- Social Network Data Analyses: Opportunities and Challenges
Nosh Contractor, Northwestern University
Nathan Cobb, MeYouHealth (PDF)
Holly Jimison, Northeastern University
- Big Data Mash-ups & Statistical Modeling: Opportunities and Challenges
Patrick Curran, University of North Carolina at Chapel Hill (PDF)
Donna Coffmann, Pennsylvania State University (PDF)
Eric Hekler, Arizona State University
- Dynamic Interventions: Opportunities and Challenges
Linda Collins, Penn State University (PDF)
Bonnie Spring, Northwestern University (PDF)
Genevieve Dunton, University of Southern California
- Federal Panel Discussants
Wendy Nilsen, National Institutes of Health
Misha Pavel, National Science Foundation
Lynda Hardy, National Institutes of Nursing Research & BD2
Damon Davis, U.S. Dept. of Health and Human Services
- Synopsis of “International Workshop on New Computationally-Enabled Theoretical Models to Support Health Behavior Change and Maintenance”
Donna Spruijt-Metz, University of Southern California