Observational Data Collection

Enhancing Observational Data Collection to Inform Precision Cancer Research and Care

DCCPS has the lead responsibility at NCI for supporting research in surveillance, epidemiology, health care delivery, behavioral science, and cancer survivorship. The division also plays a central role within the federal government as a source of expertise and evidence on issues such as the quality of cancer care, the economic burden of cancer, geographic information systems, statistical methods, communication science, tobacco control, and the translation of research into practice. To support that mission, the division gathers population-based observational data through initiatives, funding supplements, and requests for applications (RFAs), and makes those data available to the cancer research community.

In our new report, we describe ways in which we are working to expand and enrich those data resources. Below we also provide brief summaries and links to examples of those observational data resources available through DCCPS and our partners.

Population-based Observational Data Available through DCCPS and Partners

The Surveillance, Epidemiology, and End Results (SEER) Program provides information on cancer statistics to reduce the burden of cancer among the US population. NCI has funded the SEER Program since 1973 to support research on the diagnosis, treatment, and outcomes of cancer.

SEER Program

The SEER-Medicare database results from the linkage of two large population-based data sources: the SEER cancer registries data and the Medicare enrollment and claims files for beneficiaries.


The SEER-Medicare Health Outcomes Survey (SEER-MHOS) linked database is designed to improve understanding of the health-related quality of life of cancer patients and survivors enrolled in Medicare Advantage health plans. The database contains clinical, quality-of-life, socioeconomic, demographic, and other information. SEER-MHOS is sponsored by NCI and the Centers for Medicare & Medicaid Services (CMS).


The SEER-CAHPS data set is a resource for quality-of-care research based on a linkage of SEER cancer registry data with CMS’ Medicare Consumer Assessment of Healthcare Providers and Systems (CAHPS®) patient surveys. These data provide a rich opportunity for analyses of Medicare beneficiaries' experiences with their care at various stages of the cancer care continuum.


NCI’s SEER Program is linking with large pharmacy chain central repositories for oral therapies. These linkages will expand the clinically relevant data available to cancer researchers by giving them a more complete picture of patient treatment and outcomes. While these data are not accessible currently to the research community, a process is in development to provide accessibility in the near future.

Through collaborative partnerships, SEER is linking to data from genetic and genomic laboratories to characterize each cancer. These types of linkages using real-world data are increasingly important for understanding effectiveness in patients outside the clinical trial setting, who tend to come from more diverse backgrounds than those enrolled in trials.

The Health Information National Trends Survey (HINTS) is a cross-sectional, nationally representative survey of American adults that utilizes self-administered mail questionnaires to monitor the health information environment and assess how people access and use health-related information. NCI and extramural communication researchers analyze HINTS data to gain insight into people's knowledge about cancer, the communication channels through which they obtain health information, and their cancer-related behaviors.


The National Health Interview Survey (NHIS) is an annual nationwide survey of approximately 35,000 households. It is conducted by the National Center for Health Statistics and administered by the US Census Bureau. A Cancer Control Supplement (CCS) has been periodically fielded on the NHIS since 1987, and since 2000 the CCS has been co-sponsored by NCI and CDC.


Patterns of Care data – collected under a Congressional Mandate to NCI – are used to evaluate the dissemination of state-of the-art cancer therapies and diagnostics into community practices and to inform NCI about trends in cancer therapy and survival over time, as well as health disparities. The project is coordinated jointly by DCCPS and NCI’s Division of Cancer Treatment and Diagnosis.

Patterns of Care

NCI conducts periodic assessment of physician practice with respect to new, as well as established, cancer control technologies. The NCI-led surveys collect current, national data on primary care physicians' knowledge, attitudes, recommendations, and practices. The National Survey of Precision Medicine in Cancer Treatment, sponsored by NCI, with support from the American Cancer Society and National Human Genome Research Institute, is the first nationally representative survey of oncologists about the current practice of precision medicine in cancer treatment.

Physician Surveys

The Cancer Epidemiology Descriptive Cohort Database (CEDCD), maintained by EGRP, contains descriptive information about cohort studies that follow groups of persons over time for cancer incidence, mortality, and other health outcomes. The CEDCD is a searchable database that contains general study information (e.g., eligibility criteria and size), the type of data collected at baseline, cancer sites, number of participants diagnosed with cancer, and biospecimen information. All data included in this database are aggregated for each cohort; there are no individual-level data.


The database of Genotypes and Phenotypes (dbGaP), managed by NIH, is a repository for genetic and/or phenotypic data from research studies. Most datasets are managed under controlled access. In addition to NIH-funded studies that fall under NIH’s Genomic Data Sharing (GDS) policy, EGRP encourages epidemiologic researchers to make data available through dbGaP or a comparable NIH-supported repository.