Enabling Large-Scale Data Collaborations: NCI Cohort Consortium
DCCPS also drives research across the cancer control spectrum by assembling cancer epidemiology cohorts into the NCI Cohort Consortium. This extramural-intramural partnership facilitates large-scale collaborations to pool data and biospecimens necessary to conduct a wide range of prospective cancer studies.
Through its collaborative network of investigators, the consortium provides a coordinated, interdisciplinary approach to tackle important scientific questions with economies of scale (Swerdlow et al). To date, NCI Cohort Consortium members have launched more than 50 scientific initiatives, and more than 260 publications have resulted from Cohort Consortium projects. At the consortium’s inception, foundational work led to the development of risk assessment algorithms and polygenic risk scores, holding promise for precision medicine and prevention (Hunter et al). The consortium is most advantageous for prospective studies of rarer cancer sites, where collaboration among multiple cohorts provides the needed large sample size. Examples include the studies of the association between vitamin D and rarer cancers such as ovarian and pancreatic cancer (Helzlsouer et al), the liver cancer pooling project (Petrick et al), and a new project focused on appendiceal cancer.
In addition to coordinating the NCI Cohort Consortium’s activities, the division’s Epidemiology and Genomics Research Program (EGRP) supports additional cohort-related resources for researchers, such as the Cancer Epidemiology Descriptive Cohort Database (CEDCD). The CEDCD is a searchable database containing general study information about the cohorts that are active in the consortium (e.g., eligibility criteria and enrollment), types of data and biospecimens collected, number of participants diagnosed with cancer, and key contacts for each cohort. The goal of the CEDCD is to facilitate collaboration and highlight the opportunities for research within existing cohort studies.
The Cohort Metadata Repository (CMR) is a tool developed by EGRP that documents data harmonization across cohorts. Variables from each cohort can be searched and compared to determine if harmonization is possible. Once harmonization has occurred, the harmonized variables and the specifications used to create the variables are also documented in the CMR. The CMR contains only metadata (variable names, formats, codes, descriptions) and no individual-level data.
The collaborative research activities facilitated by the NCI Cohort Consortium, and resources such as the CEDCD and CMR, advance the goal of DCCPS and NCI to improve the health of the public through cancer control.
Footnotes
Swerdlow AJ, et al. The National Cancer Institute Cohort Consortium: an international pooling collaboration of 58 cohorts from 20 countries. Cancer Epidemiol Biomarkers Prev. 2018 Nov;27(11):1307–1319.
Hunter DJ, et al. A candidate gene approach to searching for low-penetrance breast and prostate cancer genes. Nat Rev Cancer. 2005 Dec;5(12):977–85.
Helzlsouer KJ, et al. Overview of the Cohort Consortium Vitamin D Pooling Project of Rarer Cancers. Am J Epidemiol. 2010 Jul 1;172(1):4–9.
Petrick JL, et al. Tobacco, alcohol use and risk of hepatocellular carcinoma and intrahepatic cholangiocarcinoma: the Liver Cancer Pooling Project. Br J Cancer. 2018 Apr;118(7):1005–1012.