Pan-Cancer Project

Date :

What were the aims of the Pan-Cancer project?

The aim of the Pan-Cancer project is to understand the genomic changes in many forms of cancer worldwide, with a view to enabling further research into the causes, prevention, diagnosis, and treatment of cancers.

DNA changes can be inherited (known as germline variations) or can appear during a person’s life (somatic variations). The Pan-Cancer project investigated both types of variations in the DNA of cancer cells. Scientists studied regulatory sites (regions of the genome that affect the activity of other genes), non-coding RNA (molecules that can have functions such as regulating gene expression), and large-scale structural alterations in the genome.

Why was the Pan-Cancer project needed?

The Pan-Cancer project is the largest, most comprehensive analysis of cancer genomes to date. To understand the complex changes in the genome that can lead to the development of cancer, a huge amount of data was needed. This could only be achieved by working collaboratively and sharing data. The project analysed almost every cancer genome throughout the world that was publically available when the project began. 

Which were the leading institutions?

The Pan-Cancer project was a collaboration involving more than 1300 scientists and clinicians from 37 countries and more than 70 institutions. The scientific steering committee included representatives from the five leading institutions: the European Molecular Biology Laboratory, the Ontario Institute for Cancer Research, the Broad Institute of MIT and Harvard, the Wellcome Sanger Institute, and the University of California, Santa Cruz.

Which cancers were studied?

The Pan-Cancer project studied 38 distinct tumour types from more than 2658 donors. The study included:

  • Central nervous system (CNS) cancer (glioblastoma, medulloblastoma, oligodendroglioma, pilocytic astrocytoma, and malignant melanoma)

  • Skin cancer

  • Biliary cancer

  • Bladder cancer

  • Colorectal cancer

  • Oesophageal cancer

  • Liver cancer

  • Lung cancer (hepatocellular carcinoma, combined hepatocellular carcinoma/cholangiocarcinoma, fibrolamellar hepatocellular carcinoma, adenocarcinoma, adenocarcinoma in situ, mucinous adenocarcinoma, squamous cell carcinoma; basaloid squamous cell carcinoma

  • Pancreatic cancer (adenocarcinoma, acinar cell cancer, mucinous adenocarcinoma, adenosquamous cancer, neuroendocrine carcinoma)

  • Prostate cancer

  • Stomach cancer

  • Thyroid cancer

  • Bone cancer (osteoblastoma, osteofibrous dysplasia, chondroblastoma, chrondromyxoid fibroma, adamantinoma, chordoma, osteosarcoma, leiomyosarcoma, liposarcoma)

  • Cervical cancer (adenocarcinoma, squamous cell carcinoma)

  • Head/neck cancer

  • Kidney cancer (adenocarcinoma, chromophobe type; adenocarcinoma, clear cell type; adenocarcinoma, papillary type)

  • Lymphoid cancer (Burkitt, diffuse large B-cell, follicular, marginal zone, post-transplant, chronic lymphocytic leukaemia)

  • Myeloid cancer (acute myeloid leukaemia, chronic myelomonocytic leukaemia, myelodysplastic syndrome with ring sideroblasts, essential thrombocythaemia, polycythaemia vera, myelofibrosis)

  • Ovarian cancer

  • Uterine cancer

  • Breast cancer (infiltrating duct carcinoma, medullary carcinoma, mucinous adenocarcinoma, duct micropapillary carcinoma, lobular carcinoma)

What were the technical difficulties in analysing the data?

The total dataset of more than 5000 genomes from 2658 donors (two samples per donor: one from cancer, one from a healthy cell) created 800 terabytes of data. Computations were performed in the cloud or on high-performance computing clusters provided by various institutions. Datasets were then combined to enable specific research studies with the combined data.

Where did the data come from?

All genomes used in the Pan-Cancer project had previously been collected for other projects. Researchers processed the whole genome data of 2658 donors from 48 cancer projects around the world. 

What is the main finding from the Pan-Cancer project?

The Pan-Cancer project explored the nature and consequences of DNA variations in cancer, across the entire genome, from both protein-coding genes and from areas of DNA that do not code for proteins. This makes the Pan-Cancer project the most comprehensive analysis performed to date of the non-coding regions of cancer genomes.

The main finding is that the cancer genome is finite and knowable, but enormously complicated. By combining sequencing of the whole cancer genome with a suite of analysis tools, it’s possible to characterise every genetic change found in a cancer. This includes all the processes that have generated those changes, all the biological pathways impacted by the changes, the kinds of cells that were originally transformed, and even the order of key events during a cancer’s life history.

What else has the Pan-Cancer project revealed?

The first wave of results has been published in more than 20 scientific publications in Nature and its affiliated journals. Scientific highlights include:

  • Scientists from EMBL present a tool for large-scale analysis of genomic data with cloud computing.More here.

  • EMBL group leader Jan Korbel addresses the challenges of working with datasets across national boundaries. More here.

  • Researchers, including scientists from EMBL-EBI, have created the largest and most comprehensive catalogue of cancer-specific RNA alterations, which reveals new insights into the cancer genome. More here.

  • The analysis of the whole cancer genomes provided key insights about genetic drivers of cancer. More here.

  • Scientists have discovered that the massive genomic rearrangements that occur in the process known as chromothripsis are far more common across cancers than previously thought. Chromothripsis (or “chromosome shattering”) is a mutational process in which large stretches of a chromosome undergo large genomic rearrangements in a single catastrophic event. Fully understanding how these alterations drive cancer genome evolution and what molecular mechanisms are involved in their generation is an important step towards understanding cancer genome evolution. More here.

  • By analysing tumour progression, scientists found that many cancers have a typical, predictable outcome in their early mutation patterns. They also found that mutations that drive cancer progression happen years, or even decades, before diagnosis. More here.

How will the results help cancer research?

The Pan-Cancer project has established an enormous resource for the scientific community; a resource that will underpin ongoing development of analysis methods, provide a testing ground for new ideas about cancer development, and act as a benchmark for comparison of future sequencing studies. 

The Pan-Cancer data is available to the wider research community, and will help accelerate additional discoveries. Over time, these discoveries will lead to improved diagnosis, management, and treatment of cancer.

The suite of analysis tools generated by the project has been released to the scientific and clinical communities, and is free to be used and further developed – this is important because data analysis has been a major barrier to improving access to cancer genome sequencing. The raw sequencing data and downstream analyses are also open to the community under appropriate controls to safeguard participants’ privacy. 

How will the Pan-Cancer project help cancer patients?

The results from the Pan-Cancer project will enable more personalised medicine in the future, once clinical whole genome sequencing of a patient’s cancer becomes more widely adopted. This will include accurate diagnosis of tumour type, better prediction of clinical outcome, and the ability to choose the optimal treatment for the patient.

The Pan-Cancer researchers have developed a method to find out where cancers come from (the ‘cell of origin’) in patients in whom this couldn’t be identified using standard diagnostic techniques. This could impact diagnosis and treatment of cancer in the future.

Thanks to the study, researchers can now identify mutations in the genome that occurred years, or sometimes even decades, before a tumour appears, allowing scientists to work out the age of tumours and the key genomic stages they pass through. This makes it possible to determine the earliest changes in the evolution of many cancer types, with the potential to develop new strategies for diagnosing or intervening in tumours at earlier stages. We are not there yet, but this is the goal.

Until now, scientists had mainly looked at the part of the cancer genome that codes for proteins, leaving 99% of the genome unstudied. The Pan-Cancer project has filled the gaps in our knowledge of what drives cancer. At least one causative genetic change was found in more than 95% of all cancers in the study, and many individual tumours had 5–10 or more causative mutations identified. This information will help us find better methods for diagnosis, because the causative mutations determine what type of tumour develops. They may also point to useful drug targets for future therapies. A major goal for researchers is to identify, for any given patient, all of the specific mutations that drive his or her cancer.

As part of the project, researchers have described many new processes that generate mutations in cancer genomes. These processes leave distinctive ‘mutational signatures’ in the genome, and these signatures can give clues about what may have caused the cancer. For example, lifestyle exposures such as cigarette smoking or sunbathing can cause patterns of mutation that are highly distinctive; likewise, inherited cancer disorders can lead to distinctive signatures. These signatures can be read from a patient’s cancer genome and compared against the catalogue of signatures identified in this project.

What are the next steps?

Further insights into cancer biology are expected to be made using the Pan-Cancer data and related software tools that have been made available to the global cancer research community