Cancer digital slide archive: An informatics resource to support integrated in silico analysis of TCGA pathology data

David A. Gutman*, Jake Cobb, Dhananjaya Somanna, Yuna Park, Fusheng Wang, Tahsin Kurc, Joel H. Saltz, Daniel J. Brat, Lee A D Cooper

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

102 Scopus citations


Background: The integration and visualization of multimodal datasets is a common challenge in biomedical informatics. Several recent studies of The Cancer Genome Atlas (TCGA) data have illustrated important relationships between morphology observed in whole-slide images, outcome, and genetic events. The pairing of genomics and rich clinical descriptions with whole-slide imaging provided by TCGA presents a unique opportunity to perform these correlative studies. However, better tools are needed to integrate the vast and disparate data types. Objective: To build an integrated web-based platform supporting whole-slide pathology image visualization and data integration. Materials and methods: All images and genomic data were directly obtained from the TCGA and National Cancer Institute (NCI) websites. Results: The Cancer Digital Slide Archive (CDSA) produced is accessible to the public (http://cancer. and currently hosts more than 20 000 whole-slide images from 22 cancer types. Discussion: The capabilities of CDSA are demonstrated using TCGA datasets to integrate pathology imaging with associated clinical, genomic and MRI measurements in glioblastomas and can be extended to other tumor types. CDSA also allows URL-based sharing of wholeslide images, and has preliminary support for directly sharing regions of interest and other annotations. Images can also be selected on the basis of other metadata, such as mutational profile, patient age, and other relevant characteristics. Conclusions: With the increasing availability of wholeslide scanners, analysis of digitized pathology images will become increasingly important in linking morphologic observations with genomic and clinical endpoints.

Original languageEnglish (US)
Pages (from-to)1091-1098
Number of pages8
JournalJournal of the American Medical Informatics Association
Issue number6
StatePublished - 2013

ASJC Scopus subject areas

  • Health Informatics


Dive into the research topics of 'Cancer digital slide archive: An informatics resource to support integrated in silico analysis of TCGA pathology data'. Together they form a unique fingerprint.

Cite this