Plant Genome Organization and Structure : Analysis of Genomes by Reassociation Experiments

Analysis of Genomes by Reassociation Experiments

Repeated Sequences

Organization of Single-copy Sequences

Evolution of Repeated Sequences in Cereals

Estimating the Number of Expressed Genes

Chloroplast Genome Organization

Mitochondrial Genome Organization

RNA Editing

Course Topics

Course Home Page

Analysis of Genomes by Reassociation Experiments

If a DNA molecule is melted and allowed to reassociate, the complexity of the genome dictates the rate in which duplex DNA will form. If we consider a simple molecule that consists of alternating GCs, this molecule will be able to form a duplex quicker than a molecule that consists of repeating blocks of AGCT. As the number of different combinations of bases increases, the time required for complete duplex formation to occur will increase. Renaturation, or duplex formation requires random collisions between two single-stranded molecules. This process follows second-order kinetics and is concentration dependent. We will not go through the derivation of the formula but the important parameter used to define a certain DNA is:Cot.

This value is defined as the amount of time required for one-half of the DNA to reanneal or form duplex DNA. The units for this parameter is moles of nucleotides per liter per second. The more complex the genome of interest, the longer it will take for like sequences to reanneal. Consequently, the Cot will be larger. Thus in terms of reassociation kinetics complexity has a specific definition.

Complexity - the total length of different sequences

For example, E. coli is considered to have a complexity of 4.2 x 106 base pairs. What is the experimental procedure used to derive these values? In general the procedure is:

  1. Shear the DNA to be analyzed to a length of about 300 bp.
  2. Melt the DNA (usually in 0.12 M phosphate buffer) by boiling for 5 min.
  3. Quickly place at 60oC.
  4. Take aliquots at different time points. Separate single-stranded DNA from double-stranded DNA by hydroxyapatite. Measure the amount of DNA that is double-stranded by absorbance at 260 nm.
  5. Plot the amount that is single-stranded versus the Cot value. The Cot value is expressed in log equivalent. This plot depicts the Cot curve.
When this type of experiment is performed with eukaryotic DNA three components are usually seen. These components each reanneal with their own unique Cot value. The three components are termed the fast, intermediate, and slow components. Why do we see these three components? Eukaryotic genomes are characterized by sequences that are represented by different copy numbers. If a sequence is found many times in the genome, it will reanneal much quicker than those sequences that are found only once in the same genome. Thus the equivalent Cot curve for a eukaryotic genome will be different than a genome, such as E. coli, which only contains single copy sequences.

A comparison of the Cot value of each of these components with an E. coli standard allows us to derive the complexity of each component. The complexity of the slow component of the genome is greater than that for the other two components and is considered to represent the single copy portion of the genome. The complexity of the slow component can be used as a good estimate of the genome size. The genome size will be the sum of the lengths of all the unique sequences. Using the example from Genes V - Lewin, p.664, the complexity of the slow component is 3 x 108 bp and the complexity of the intermediate component is 6 x 105. If we divide the complexity of the slow component into the intermediate component we get 2 x 10-3. This demonstrates that the intermediate component contributes very little to the complexity of the genome. Therefore, the complexity of the slow or single copy portion of the genome can be considered equal to the genome size.

To derive the complexity of each component, it is necessary to run a standard, such as E. coli DNA, with each experiment. As stated above, E. coli is considered to consist of only single- copy sequences. Let's say that in the experiment from which the Cot for each component was derived, the Cot value for E. coli was 4. Experimentally it was determined that the slow component comprised 45% of the total DNA. Therefore, if only that component was annealed, the Cot value would be 283 (630 x 0.45). That value is 71 (283/4) times slower than for E. coli. Therefore the complexity of the slow component is 71 times that of E. coli or 3.0 x 108 (71 x 4.2 x 106). The complexity of the other components is derived similarly.

Genome size is quite variable throughout the biological world and the genome size in plants shows the greatest variation of any kingdom in the biological world.

Variation in Genome Size among Plants

Species kb/haploid pg/haploid1
7 X 104
1 X 108

1Conversion factor: 1 pg = 0.965 X 109 bp = 6.1 X 1011 daltons

One manner in which a genome can be described is by determining the distribution of fast, intermediate and slow components in the genome. For comparison purposes, what does the human genome look like?

Distribution Sizes Among Components of the Human Genome
Component % of Genome

The table Sequence Distribution of Selected Plant Species lists the different components from different plant species. As you can see, plant species exhibit a wide range of values for each of the components. The genome of Arabidopsis is essentially entirely single copy sequences (the repetitive sequences have been determined to be essentially all chloroplast DNA). At the other extreme, pea and wheat genomes have only 10-20% single copy sequences. Reassociation kinetic experiments of polyploid species, such as bread wheat (Triticum aestivium ) were unable to derive a component that displayed true single copy kinetics. Instead the slowest component appeared to act as if it consisted of copies represented three times. This result is consistent with the current hypothesis that bread wheat was developed from the introgression of three diploid wheat species. Genomic analysis suggest that this hypothesis is correct since the slowest component appears to consist of sequences represented three times.

Copyright © 1998. Phillip McClean