Information content of the Human Genome
المؤلف:
Cohn, R. D., Scherer, S. W., & Hamosh, A.
المصدر:
Thompson & Thompson Genetics and Genomics in Medicine
الجزء والصفحة:
9th E, P23-24
2025-11-05
40
How does the approximately 3-billion-letter digital code of the human genome guide the intricacies of human anatomy, physiology, and biochemistry to which Berg referred? The answer lies in the enormous amplification and integration of information content that occurs as one moves from genes in the genome to their products in the cell and to the observable expression of that genetic information as cellular, morphologic, clinical, or biochemical traits—that is, the phenotype of the individual. This hierarchic expansion of information from the genome to phenotype includes a wide range of structural and regulatory RNA products, as well as protein products that orchestrate the many functions of cells, tissues, organs, and the entire organism, in addition to their interactions with the environment. Even with the essentially complete sequence of the human genome in hand, we still do not know the precise number of genes in the genome. Our traditional definition of genes has also expanded. Current estimates are that the genome contains ~20,000 protein-coding genes, but this figure only begins to hint at the levels of complexity that emerge from the decoding of this digital information (Fig. 1).

Fig1. The amplification of genetic information from genome to gene products to gene networks and ultimately to cellular function and phenotype. The genome contains both protein-coding genes (blue) and noncoding RNA (ncRNA) genes (red). Many genes in the genome use alternative coding information to generate multiple different products. Both small and large ncRNAs participate in gene regulation. Many proteins participate in multigene networks that respond to cellular signals in a coordinated and combinatorial manner, thus further expanding the range of cellular functions that underlie organismal phenotypes.
As introduced briefly in Chapter 2, the product of protein-coding genes is a protein whose structure ultimately determines its particular function(s) in the cell. But if there were a simple one-to-one correspondence between genes and proteins, we could have at most ~20,000 different proteins. This number is insufficient to account for the vast array of functions that occur in human cells over the life span. The answer to this dilemma is found in two features of gene structure and function. First, many genes are capable of generating multiple different products, not just one (see Fig. 1). This process, discussed later in this chapter, is accomplished through the use of alternative coding segments in genes and through the subsequent biochemical modification of the encoded protein; these two features of complex genomes result in a substantial amplification of information content. Indeed, it has been estimated that in this way, these 20,000 human genes can encode many hundreds of thousands of different proteins, collectively referred to as the proteome. Second, individual proteins do not function by themselves. They form networks, often involving many different proteins and regulatory RNAs that respond in a coordinated and integrated fashion to many different genetic, developmental, or environmental signals. The combinatorial nature of protein networks results in an even greater diversity of possible cellular functions.
Genes are located throughout the genome but tend to cluster in particular regions on particular chromosomes and to be relatively sparse in other regions or on other chromosomes. For example, chromosome 11, an ~135 million-bp (megabase pairs [Mb]) chromosome, is relatively gene-rich with ~1300 protein-coding genes. These genes are not distributed randomly along the chromosome, and their localization is particularly enriched in two chromosomal regions with gene density as high as one gene every 10 kb (Fig. 2). Some of the genes belong to families of related genes, as we will describe more fully later in this chapter. Other regions are gene-poor, and there are several so-called gene deserts of 1 million bp or more without any identified protein-coding genes. There are two caveats here: first, the process of gene identification and genome annotation remains an ongoing process despite the apparent robust ness of recent estimates. It is virtually certain that there are some genes, including clinically relevant genes, that are currently undetected or that display characteristics that we do not currently recognize as being associated with genes. Second, as mentioned in Chapter 2, many genes are not protein coding; their products are functional RNA molecules (noncoding RNAs [ncRNAs]) (see Fig.1) that play a variety of roles in the cell, many of which are only just being uncovered.

Fig2. Gene content on chromosome 11, which consists of 135 Mb of DNA. (A) The distribution of genes is indicated along the chromosome and is high in two regions of the chromosome and low in other regions. (B) An expanded region from 5.15 to 5.35 Mb (measured from the short-arm telomere), which contains 10 known protein-coding genes, five belonging to the olfactory receptor (OR) gene family and five belonging to the globin gene family. (C) The five β-like globin genes expanded further. (Data from European Bioinformatics Institute and Wellcome Trust Sanger Institute: Ensembl release 70, January 2013. Available from http://www.ensembl.org).
For genes located on the autosomes, there are two copies of each gene, one on the chromosome inherited from the mother and one on the chromosome inherited from the father. For most autosomal genes, both copies are expressed and generate a product. There are, however, a growing number of genes in the genome that are exceptions to this general rule and are expressed at characteristically different levels from the two copies, including some that, at the extreme, are expressed from only one of the two homologues. These examples of allelic imbalance are discussed in greater detail later in this chapter, as well as in Chapters 6 and 7. In addition, many genes are present in variable numbers at a particular location on a chromosome. One example is the variability in the copy number of the genes for amylase, an enzyme important in starch digestion; AMY1 exists in two to eight copies per chromosome and is expressed in the salivary glands.
الاكثر قراءة في مواضيع عامة في الاحياء الجزيئي
اخر الاخبار
اخبار العتبة العباسية المقدسة