All of the above techniques of investigation are themselves ‘molecular biology’ in the original sense of the term; however, the term ‘molecular biology’ has taken on the new and different meaning of ‘genetic engineering’ or ‘genetic manipulation.’ These techniques for manipulating nucleic acids in vitro (that is, outside living cells or organisms) do not comprise a new discipline but are an outgrowth of earlier developments in biochemistry and cell biology over the previous 50 years. This powerful new technology has revolutionized virology and, to a large extent, has shifted the focus of attention away from the virus particle onto the virus genome. Again, this book is not the place to discuss in detail the technical aspects of these methods, and readers are referred to one of the many relevant texts, such as those given at the end of this chapter.
Virus infection has long been used to probe the working of ‘normal’ (i.e., uninfected) cells—for example, to look at macromolecular synthesis. This is true, for example, of the applications of bacteriophages in bacterial genetics and in many instances where the study of eukaryotic viruses has revealed fundamental information about the cell biology and genomic organization of higher organisms. In 1970, John Kates first observed that vaccinia virus mRNAs were polyadenylated at their 3¢ ends. In the same year, Howard Temin and David Baltimore jointly identified the enzyme reverse transcriptase (RNA-dependent DNA polymerase) in retrovirusinfected cells. This finding shattered the so-called ‘central dogma’ of biology that there is a one-way flow of information from DNA through RNA into protein and revealed the plasticity of the eukaryote genome. Subsequently, the purification of this enzyme from retrovirus particles permitted cDNA cloning, which greatly accelerated the study of viruses with RNA genomes—a good illustration of the catalytic nature of scientific advances. In 1977, Richard Roberts and, independently, Phillip Sharp recognized that adenovirus mRNAs were spliced to remove intervening sequences, indicating the similarities between virus and cellular genomes. Initially at least, the effect of this new technology was to shift the emphasis of investigation from proteins to nucleic acids. As the power of the techniques developed, it quickly became possible to determine the nucleotide sequences of entire virus genomes, beginning with the smallest bacteriophages in the mid-1970s and working up to the largest of all virus genomes, those of the herpesviruses and poxviruses, many of which have now been determined. This nucleic acid-centred technology, in addition to its ultimate achievement of nucleotide sequencing and the artificial manipulation of virus genomes, also offered significant advances in detection of viruses and virus infections involving nucleic acid hybridization techniques. There are many variants of this basic idea, but, essentially, a hybridization probe, labelled in some fashion to facilitate detection, is allowed to react with a crude mixture of nucleic acids. The specific interaction of the probe sequence with complementary virus-encoded sequences, to which it binds by hydrogen-bond formation between the complementary base pairs, reveals the presence of the virus genetic material (Figure 1.7). This approach has been taken a stage further by the development of various in vitro nucleic acid amplification procedures, such as polymerase chain reaction (PCR), which is an even more sensitive technique, capable of detecting just a single molecule of virus nucleic acid (Figure 1.8). More recently, there has also been renewed interest in virus proteins based on a new biology which is itself dependent on manipulation of nucleic acids in vitro and advances in protein detection arising from immunology. Methods for in vitro synthesis and expression of proteins from molecularly cloned DNA have advanced rapidly, and many new analytical techniques are now available. Studies of protein–nucleic acid interactions are proving to be particularly valuable in understanding virus structure and gene expression.Advances in electrophoresis have made it possible to study simultaneously all of the proteins in a virus-infected cell, called the proteome of the cell (by analogy to the genome). Molecular biologists have one further trick up their sleeves. Because of the repetitive, digitized nature of nucleotide sequences, computers are the ideal means of storing and processing this mass of information. ‘Bioinformatics’ is a broad term coined in the 1980s to encompass any application of computers to biology. This can imply anything from artificial intelligence and robotics to genome analysis. More specifically, the term applies to computer manipulation of biological sequence data, including protein structural analysis. Bioinformatics permits the inference of function from the linear sequence and is thus central to all areas of modern biology. Due to the flood of new sequence information, computers are being used increasingly to make predictions based on nucleotide sequences (Figure 1.9).These include detecting the presence of open reading frames, the amino acid sequences of the proteins encoded by them, control regions of genes such as promoters and splice signals, and the secondary structure of proteins and nucleic acids. However (particularly in the case of RNA), the secondary structure assumed by molecules is almost as important as the primary nucleotide sequence in determining the biological reactions that the molecule may undergo. Caution is needed in interpreting such predicted rather than factual information, and the validity of such predictions should not be accepted without question unless confirmed by biochemical and/or genetic data. However, when the structure of a protein has been determined by x-ray crystallography or NMR, the shape can be accurately modelled and explored in three dimensions on computers (Figure 1.10).
Figure 1.7 Nucleic acid hybridization relies on the specificity of base-pairing which allows a labelled nucleic acid probe to pick out a complementary target sequence from a complex mixture of sequences in the test sample. The label used to identify the probe may be a radioisotope or a nonisotopic label such as an enzyme or chemiluminescent system. Hybridization may be performed with both the probe and test sequences in the liquid phase (top of figure) or with the test sequences bound to a solid phase, usually a nitrocellulose or nylon membrane (below). Both methods may be used to quantify the amount of the test sequence present, but solid-phase hybridization is also used to locate the position of sequences immobilized on the membrane. Plaque and colony hybridization are used to locate recombinant molecules directly from a mixture of bacterial colonies or bacteriophage plaques on an agar plate. Northern and Southern blotting are used to detect RNA and DNA, respectively, after transfer of these molecules from gels following separation by electrophoresis (cf., western blotting, Figure 1.2).
Figure 1.8 Polymerase chain reaction (PCR) relies on the specificity of basepairing between short synthetic olignucleotide probes and complementary sequences in a complex mixture of nucleic acids to prime DNA synthesis using a thermostable DNA polymerase. Multiple cycles of primer annealing, extension, and thermal denaturation are carried out in an automated process, resulting in a massive amplification (2n-fold increase after n cycles of amplification) of the target sequence located between the two primers.
While the genome is the nucleic acid comprising the entire genetic information of an organism, by extension ‘genomics’ is the study of the composition and function of the genetic material of an organism. Virus genomics began with the first complete sequence of a virus genome (bacteriophage fX174 in 1977). Vast international databases of nucleotide and protein sequence information have now been compiled, and these can be rapidly accessed by computers to compare newly determined sequences with those whose function may have been studied in great detail. At the time of publication, the complete genome sequences of almost 1500 different viruses had been published, with more appearing almost weekly (Table 1.1).
Figure 1.9 An example of the use of a computer to store and process digitized information from a nucleic acid sequence. This figure shows an analysis of all of the open reading frames (ORFs) present in an HIV-1 provirus.The ORFs present in the three main retrovirus genes, gag, pol, and env, can be seen. This complex analysis took only a few seconds to perform using an ordinary personal computer. Manually, the same task may have taken several days.
Figure 1.10 Three-dimensional structure of the DNA binding domain of SV40 T-antigen reconstructed from NMR data using a computer
Thus we have, in a sense, come full circle in our investigations of viruses— from particles via genomes back to proteins again—and have emerged with a far more profound understanding of these organisms; however, the current pace of research in virology tells us that there is still far more that we need to know.