Scientists decode entire human genome at last
Scientists say they have finally assembled the full genetic blueprint for human life, adding the missing pieces to a puzzle nearly completed two decades ago.
An international team described the first-ever sequencing of a complete human genome – the set of instructions to build and sustain a human being – in research published Thursday in the journal Science. The previous effort, celebrated across the world, was incomplete because DNA sequencing technologies of the day weren't able to read certain parts of it. Even after updates, it was missing about 8% of the genome.
"Some of the genes that make us uniquely human were actually in this 'dark matter of the genome' and they were totally missed," said Evan Eichler, a University of Washington researcher who participated in the current effort and the original Human Genome Project. "It took 20-plus years, but we finally got it done."
Many — including Eichler's own students — thought it had been finished already. "I was teaching them, and they said, 'Wait a minute. Isn't this like the sixth time you guys have declared victory? I said, 'No, this time we really, really did it!"
Scientists said this full picture of the genome will give humanity a greater understanding of our evolution and biology while also opening the door to medical discoveries in areas like aging, neurodegenerative conditions, cancer and heart disease.
"We're just broadening our opportunities to understand human disease," said Karen Miga, an author of one of the six studies published Thursday.
The research caps off decades of work. The first draft of the human genome was announced in a White House ceremony in 2000 by leaders of two competing entities: an international publicly funded project led by an agency of the U.S. National Institutes of Health and a private company, Maryland-based Celera Genomics.
The human genome is made up of about 3.1 billion DNA subunits, pairs of chemical bases known by the letters A, C, G and T. Genes are strings of these lettered pairs that contain instructions for making proteins, the building blocks of life. Humans have about 30,000 genes, organized in 23 groups called chromosomes that are found in the nucleus of every cell.
Before now, there were "large and persistent gaps that have been in our map, and these gaps fall in pretty important regions," Miga said.
Miga, a genomics researcher at the University of California-Santa Cruz, worked with Adam Phillippy of the National Human Genome Research Institute to organize the team of scientists to start from scratch with a new genome with the aim of sequencing all of it, including previously missing pieces. The group, named after the sections at the very ends of chromosomes, called telomeres, is known as the Telomere-to-Telomere, or T2T, consortium.
Their work adds new genetic information to the human genome, corrects previous errors and reveals long stretches of DNA known to play important roles in both evolution and disease. A version of the research was published last year before being reviewed by scientific peers.
"This is a major improvement, I would say, of the Human Genome Project," doubling its impact, said geneticist Ting Wang of the Washington University School of Medicine in St. Louis, who was not involved in the research.
Eichler said some scientists used to think unknown areas contained "junk." Not him. "Some of us always believed there was gold in those hills," he said. Eichler is paid by the Howard Hughes Medical Institute, which also supports The Associated Press's health and science department.
Turns out that gold includes many important genes, he said, such as ones integral to making a person's brain bigger than a chimp's, with more neurons and connections.
To find such genes, scientists needed new ways to read life's cryptic genetic language.
Reading genes requires cutting the strands of DNA into pieces hundreds to thousands of letters long. Sequencing machines read the letters in each piece and scientists try to put the pieces in the right order. That's especially tough in areas where letters repeat.
Scientists said some areas were illegible before improvements in gene sequencing machines that now allow them to, for example, accurately read a million letters of DNA at a time. That allows scientists to see genes with repeated areas as longer strings instead of snippets that they had to later piece together.
Researchers also had to overcome another challenge: Most cells contain genomes from both mother and father, confusing attempts to assemble the pieces correctly. T2T researchers got around this by using a cell line from one "complete hydatidiform mole," an abnormal fertilized egg containing no fetal tissue that has two copies of the father's DNA and none of the mother's.
The next step? Mapping more genomes, including ones that include collections of genes from both parents. This effort did not map one of the 23 chromosomes that is found in males, called the Y chromosome, because the mole contained only an X.
Wang said he's working with the T2T group on the Human Pangenome Reference Consortium, which is trying to generate "reference," or template, genomes for 350 people representing the breadth of human diversity.
"Now we've gotten one genome right and we have to do many, many more," Eichler said. "This is the beginning of something really fantastic for the field of human genetics."