Sun. Sep 25th, 2022
The first print of the human genome presented as a series of books on display in the 'Medicine Now' room of the Wellcome Collection, London.
enlarge / The first print of the human genome presented as a series of books on display in the ‘Medicine Now’ room of the Wellcome Collection, London.

The sequence of the human genome, first published in 2001, has some important information that is missing. Its latest version, called GRCh38, has a monstrous 3.1 gigabase of information, but that’s still not enough. A letter published in Natural Genetics discovered this week that the reference genome lacks a whopping 10 percent of the genetic information found in the genomes of hundreds of people of African ancestry — information also found in other human populations.

Get the reference

The ‘human genome’ is in fact composed of the genomes of only a handful of people, with the majority of GRCh38 coming from just one person. It’s not a snapshot of what’s in human DNA, but rather a sort of template and roadmap, giving an idea of ​​what’s inside and allowing for comparisons between individuals and the “reference genome.”

We know this is a limitation and have been constantly adding additions to the reference genome, making it better able to represent the vast array of variations present in modern humans. But because the resource is so limited, the authors of this week’s letter write, so is its usefulness: “In recent years, a growing number of researchers have emphasized the importance of capturing and displaying sequence data from different populations.”

The current situation, they write, makes it difficult to analyze people whose ancestors are very different from those of the reference genome. While there are some methods that allow researchers to look at limited amounts of genetic diversity beyond the reference, a more comprehensive solution that is gaining popularity is building population-specific references — a project already underway for certain groups, including Chinese and Ashkenazi .

The genome of all humans

There is no “pan genome” – no “set of sequences covering all of the DNA in [a] population,” write lead author Rachel Sherman and her colleagues. It’s been done for bacteria, but not for humans. So they set out to create a pan-genome for Africa, using DNA from 910 people of African descent. The group includes humans. from the Caribbean and the US, which retain some of Africa’s genetic diversity, even though they have their own distinct genetic histories.

They compared the DNA of these hundreds of people with the reference genome, looking for long sections that didn’t match. The basic unit of DNA is the base pair, one of the rungs on the twisted ladder that makes up the double helix. Sherman and her colleagues looked for sequences over 1,000 base pairs in length that didn’t match the reference and found many: nearly 300 million base pairs, which is about 10 percent the size of the entire reference genome.

That’s not to say this information is unique to African humans: About 40 percent of this data matched the Korean or Chinese genome. This suggests that it is important genetic material present in a large number of people, but still not captured by the reference genome composed of only a small number of people. There’s a lot going on with humans that isn’t reflected by the reference human genome.

Medical Consequences and Warnings

Any research efforts that rely on the reference genome to study human variation will miss out on this massive amount of data — and this is what “almost all studies are doing right now,” Sherman and colleagues write. “A single reference genome is not sufficient for population-based studies of human genetics,” they add, suggesting that one way forward is to create reference genomes for different human groups. Over time, this will lead to a pan-genome that will capture “all the DNA present in humans.”

This has important implications for medicine: “If you’re a scientist looking for genome variations associated with a condition that’s more common in a particular population, you might want to compare the genomes to a reference genome that’s more representative of that population.” says Rachel Sherman.

But having this information for Africans doesn’t tell us much that a scientist researching a particular condition could use. The study did not examine what is done by the DNA that was not in the reference genome and cannot say whether it may play a role in health problems or some other variation.

While population-specific genomes can be a useful way to study human variation, they can run into other problems when they leave the lab and into the real world. As evidenced by the fact that much of this DNA is also found in Koreans, populations between people are not rigid lines, especially at the DNA level. They have blurred boundaries; individuals may have genetic traits from multiple populations; and what a person looks like is not a reliable guide to their DNA.

Complicating matters even more is that there are multiple populations in Africa that may have different genetic histories that we are only now surfacing. Having a pan-African genome doesn’t necessarily say much about what characterizes an individual African.

While genetics researchers understand all of this, any use of population-specific reference genomes in fields like medicine can present a new set of problems if this clutter is not properly communicated or understood.

Natural Genetics2018. DOI: 10.1038/s41588-018-0273-y (About DOIs).

By akfire1

Leave a Reply

Your email address will not be published.