In many cases, viruses manage to spread so easily because they are so compact, allowing hundreds of thousands of viral particles to explode after a single sneeze. That compact size stems in part from their limited needs. Because viruses use parts of their host cells for much of what they need to do, even the more complicated viruses need only a few dozen specialized genes to do things like evade the immune system or stay dormant in cells. In fact, complexity seems to conflict with one of the evolutionary advantages of viruses: the ability to make many copies of themselves very quickly.
So it was a bit of a surprise to discover that there are giant viruses that contain much more genetic material than they seem to need. All cells carry the machinery needed to make proteins, so viruses carry at most only a few genes that direct the machine to focus on the virus’ needs. But the giant viruses seemed to contain replacements for many of the base machines themselves. Those viruses attacked complicated cells, with many internal structures and many complex biological processes taking place in different locations. Perhaps in that context it was advantageous to take all those seemingly redundant parts with you.
Or maybe not. In a study released today, researchers describe a large collection of giant viruses that target bacteria. Although they are smaller than some of the largest eukaryotic viruses, they are not much smaller. And since they infect bacteria, the genomes of the newly described viruses may be a significant fraction of the size of their host’s genome.
in the mix
The work is based on what has come to be called metagenomics, which essentially involves blowing up all the cells in an environmental sample and sequencing the DNA that comes out. This provides DNA sequence data on all the different microbes that live in it, as well as the viruses that live in it. Software can sift through that data and find pieces that overlap, stitching together larger parts of the genome from the smaller fragments of the sequence. But it’s difficult to put together a complete genome in this way, because repeated sequences or segments that are difficult to sequence confuse the computer. So even if there are giant viruses in these samples, a metagenomic analysis would typically identify smaller fragments of them and not link them together to reveal their full size.
Inspired by some previous evidence that bacteria-attacking viruses (technically called “phages”) can grow very large, an equally sized research team obtained many environmental samples and set out to find giant viruses. Sources included “human fecal and oral samples, fecal samples from other animals, freshwater lakes and rivers, marine ecosystems, sediments, hot springs, soils, deep subterranean habitats, and the built environment.”
After the software collected the short sequences from the original study into longer fragments, the researchers checked for gene matches to identify whether the fragment came from bacteria, complex cells, archaea or viruses. All sequences 200,000 bases or more in length were tested to see if they were actually circular (a common feature of large viral genomes in bacteria), and a handful of the largest were selected for detailed manual examination. “Manual” here means that graduate students must confirm the sequence and find ways to deal with repeated DNA or difficult sequences.
In all, the researchers put together 350 sequences of viruses based on the fact that they carry genes involved in building the viruses’ coats or exploding their host cells to spread further. Four other long series were difficult to assign to a category.
Families of Giants
Some of the apparent viruses were absolutely huge, with four exceeding 600,000 bases in length, and the largest came in at 735,000. This is in the same range as some of the large viruses that attack amoebae. But while the amoeba may have genomes that are hundreds of billions of bases in length, these viruses appear to infect bacteria with genomes that are less than 5 million bases in length. For context, there are bacteria with genomes that are only about a fifth the size of these viruses.
One of the viruses had a gene that was more than 2,300 bases long — 1.5 times the size of the whole genome of some small viruses.
Once the assembly was complete, the researchers began comparing sequences to find out what these viruses were dealing with. In many cases the answer turned out to be ‘each other’. The largest viruses were all part of a family the researchers called “Mahaphages” (Maha is the Sanskrit word for huge). Significantly, there were no small viruses that clustered among the giants, indicating that these huge genomes are likely stable features of this family rather than being the result of a smaller virus that recently got a lot of extra DNA.
Many of these viral families have genes for the transfer RNAs used in making proteins, which are normally supplied by the cell. Other genes are genes necessary for the metabolism of nucleic acids, allowing them to make some of the DNA and RNA on which they depend. Normally, both classes of genes are supplied by the host, although similar things are found in the giant viruses that infect amoebae. The authors note that this kind of gene content is similar to a group of small bacteria with small genomes that are thought to be symbiotic or parasitic. Whether this is simply a consequence of lifestyle or represents something more important is left to future studies.
Many of the viruses also contain components of the CRISPR/Cas system that we have come to use for genome editing. Bacteria usually use this system to protect themselves from viruses, which makes it strange to find viruses that have their own version. Some of these systems seem to target genes that bacteria use to control gene activity, so the version of the virus may simply involve redirecting these control systems to focus on virus production. In other cases, they target different viruses, suggesting that they are a way to restrict competitors.
Other families of viruses seem to carry proteins that turn off the bacterial CRISPR system, which is more in line with what you might expect: a means of protecting the virus from the host’s defenses.
Perhaps the strangest thing found in these viruses are genes that code for relatives of a protein called tubulin, which helps a cell organize its internal contents. Bacteria are quite remarkable in that they have an ill-defined internal organization, so it’s quite striking to see a virus use something we don’t understand very well. Still, it’s easy to see how this protein can help get all the pieces needed to put together a virus in the right place.
But there’s clearly a lot we don’t understand more generally about these viruses, including the specific cells they infect – we know the environment they come from and the genera of bacteria they’re generally found with, but not much more. than that . By figuring out more and studying their dynamics in culture, we can understand how the viruses can sometimes outperform their smaller and faster-moving relatives. In the process, they may be able to teach us some lessons about the bacteria they infect.
Nature2020. DOI: 10.1038/s41586-020-2007-4 (About DOIs).