Future Direction: Data Strategies
Selected Examples of Progress
SEER Linkages with External Data Sources
Linking the data in the central cancer registries that are part of the SEER Program with external data sources can help to fill gaps around understanding treatment patterns and other factors that may impact cancer outcomes to better characterize patient trajectories and real-world patterns of care and outcomes. Through linkages with external data sources, information regarding genomic testing, treatment information from pharmacy and insurance claims, Medicare encounter data, housing data from the US Department of Housing and Urban Development, and area-level SDOH measures will be available within SEER or as specialized data sets.
Recently, the Department of Veterans Affairs (VA) and NCI signed a landmark memorandum of understanding that will operationalize the bidirectional exchange of cancer registry information between the VA and SEER, resulting in the largest ever collection of veteran cancer data. This strategic exchange and curation of our nation’s cancer registry data will leverage the strengths of both agencies and create new opportunities for the research community to support all patients facing a cancer diagnosis.
Enhancing Population Science Data Infrastructure
Hand-in-hand with enhancing collection of data to support cancer control and population sciences research is making sure that data are FAIR (findable, accessible, interoperable, and reusable). A key component in promoting FAIR data is ensuring accessibility of that data for broader research use through improved infrastructure, particularly infrastructure that is interoperable with other NCI data resources, such as the Cancer Research Data Commons (CRDC). DCCPS has been spearheading a new population science data commons as part of the CRDC, which will initially support sharing of data from our cohort studies. Working with the existing CRDC framework not only reduced the time for development of a population data science commons but also increases the ability to connect with other NCI data available through the CRDC, enriching the potential research questions that could be addressed. Similarly, the National Childhood Cancer Registry (NCCR) data platform, a cloud-based discovery, data access, and analytic platform, is being designed to be interoperable with the other Childhood Cancer Data Initiative resources.
Encouraging Reuse of DCCPS-Supported Data
DCCPS’s goals in enhancing our approach to data strategy include both supporting the data and infrastructure needs of the research community and optimizing secondary use of DCCPS data and resources. That optimization requires not only the infrastructure pieces outlined above to improve findability and accessibility of data but also encouraging reuse of the data. To do this, DCCPS has developed several initiatives, including PAR-23-254: Secondary Analysis and Integration of Existing Data to Elucidate Cancer Risk and Related Outcomes. This PAR signals to the research community the value that DCCPS places in leveraging existing data for innovative data modeling and analysis to answer new research questions and advance cancer control research. The division has also led NCI participation in several additional trans-NIH funding opportunity announcements aimed at enhancing the secondary use of data, for example, RFA-PM-23-001, Enhancing the Use of the All of Us Research Program’s Data (R21), and RFA-PM-23-002, which supported Small Grants to Enhance the Use of the All of Us Research Program’s Data (R03).
Data Help to Drive DCCPS Scientific Priorities
Data Strategies is a cross-cutting area, fundamental to all of DCCPS’s scientific priority areas. These scientific priority areas will help to define and prioritize the opportunity areas to pursue in filling data and infrastructure gaps. For example, the Climate Change scientific priority area highlights progress in exploring external environmental data sources to link with SEER and the need to better understand what data sources exist for self-reported experiences with climate and the environment.
The Modifiable Risk Factors scientific priority area highlights the need for the collection and integration of data across behaviors and timespans to better understand the impact on health outcomes, illustrating a gap area in data and data science methods. The progress described in the Health Equity scientific priority area of the cancer epidemiology cohorts, the Persistent Poverty initiative, and Sexual and Gender Minority Population initiatives helps to fill critical data gaps to ensure all populations are served by DCCPS research. The Evidence-based Cancer Control Policy Research scientific priority area is also data-driven and will identify data needs in order to enable evidence-based policymaking, and evaluate the impact of various policies (such as tobacco control strategies) and their implementation on health-related outcomes. Finally, the Digital Health scientific priority area highlights additional data strategy needs, for example related to the use of telehealth approaches for cancer care or to better understand the effects of digital health tools and interventions on patient-provider communication.
Leveraging partnerships will be critical for moving DCCPS's data strategies forward, particularly for addressing gaps in infrastructure.
Planning for the Future
Leveraging partnerships will be critical for moving DCCPS’s data strategies forward, particularly for addressing gaps in infrastructure. For example, nascent collaborations with the National Center for Advancing Translational Sciences (NCATS) to adapt the resources it has already developed for the National COVID Cohort Collaborative (N3C) are an important opportunity area to not only address data gaps with additional information from electronic medical records but also to leverage existing tools for data harmonization and research access in a secure manner. Finally, a complement to encouraging research that utilizes existing data is establishment of an annual award to honor those investigators who demonstrate exemplary practices in data sharing.
Explore More Future Directions
Continue To