

علم الكيمياء

تاريخ الكيمياء والعلماء المشاهير

التحاضير والتجارب الكيميائية

المخاطر والوقاية في الكيمياء

اخرى

مقالات متنوعة في علم الكيمياء

كيمياء عامة


الكيمياء التحليلية

مواضيع عامة في الكيمياء التحليلية

التحليل النوعي والكمي

التحليل الآلي (الطيفي)

طرق الفصل والتنقية


الكيمياء الحياتية

مواضيع عامة في الكيمياء الحياتية

الكاربوهيدرات

الاحماض الامينية والبروتينات

الانزيمات

الدهون

الاحماض النووية

الفيتامينات والمرافقات الانزيمية

الهرمونات


الكيمياء العضوية

مواضيع عامة في الكيمياء العضوية

الهايدروكاربونات

المركبات الوسطية وميكانيكيات التفاعلات العضوية

التشخيص العضوي

تجارب وتفاعلات في الكيمياء العضوية


الكيمياء الفيزيائية

مواضيع عامة في الكيمياء الفيزيائية

الكيمياء الحرارية

حركية التفاعلات الكيميائية

الكيمياء الكهربائية


الكيمياء اللاعضوية

مواضيع عامة في الكيمياء اللاعضوية

الجدول الدوري وخواص العناصر

نظريات التآصر الكيميائي

كيمياء العناصر الانتقالية ومركباتها المعقدة


مواضيع اخرى في الكيمياء

كيمياء النانو

الكيمياء السريرية

الكيمياء الطبية والدوائية

كيمياء الاغذية والنواتج الطبيعية

الكيمياء الجنائية


الكيمياء الصناعية

البترو كيمياويات

الكيمياء الخضراء

كيمياء البيئة

كيمياء البوليمرات

مواضيع عامة في الكيمياء الصناعية

الكيمياء التناسقية

الكيمياء الاشعاعية والنووية
Protein Sequences and Evolution:- Protein Sequences Can Elucidate the History of Life on Earth
المؤلف:
David L. Nelson، Michael M. Cox
المصدر:
Lehninger Principles of Biochemistry
الجزء والصفحة:
P107-110
2026-04-16
60
Protein Sequences and Evolution:- Protein Sequences Can Elucidate the History of Life on Earth
The field of molecular evolution is often traced to Emile Zuckerkandl and Linus Pauling, whose work in the mid-1960s advanced the use of nucleotide and protein sequences to explore evolution. The premise is deceptively straightforward. If two organisms are closely related, the sequences of their genes and proteins should be simi lar. The sequences increasingly diverge as the evolutionary distance between two organisms increases. The promise of this approach began to be realized in the 1970s, when Carl Woese used ribosomal RNA sequences to define archaebacteria as a group of living organisms distinct from other bacteria and eukaryotes (see Fig. 1–4). Protein sequences offer an opportunity to greatly refine the available information. With the advent of genome projects investigating organisms from bacteria to humans, the number of available sequences is growing at an enormous rate. This information can be used to trace biological history. The challenge is in learning to read the genetic hieroglyphics. Evolution has not taken a simple linear path. Complexities abound in any attempt to mine the evolutionary information stored in protein sequences. For a given protein, the amino acid residues essential for the activity of the protein are conserved over evolutionary time. The residues that are less important to function may vary over time—that is, one amino acid may substitute for another—and these variable residues can provide the information used to trace evolution. Amino acid substitutions are not always random, however. At some positions in the primary structure, the need to maintain protein function may mean that only particular amino acid substitutions can be tolerated. Some proteins have more variable amino acid residues than others. For these and other reasons, proteins can evolve at different rates. Another complicating factor in tracing evolutionary history is the rare transfer of a gene or group of genes from one organism to another, a process called lateral gene transfer. The transferred genes may be quite similar to the genes they were derived from in the original organism, whereas most other genes in the same two organisms may be quite distantly related. An example of lateral gene transfer is the recent rapid spread of antibiotic-resistance genes in bacterial populations. The proteins derived from these transferred genes would not be good candidates for the study of bacterial evolution, because they share only a very limited evolutionary history with their “host” organisms. The study of molecular evolution generally focuses on families of closely related proteins. In most cases, the families chosen for analysis have essential functions in cellular metabolism that must have been present in the earliest viable cells, thus greatly reducing the chance that they were introduced relatively recently by lateral gene transfer. For example, a protein called EF-1 (elongation factor 1 ) is involved in the synthesis of proteins in all eukaryotes. A similar protein, EF-Tu, with the same function, is found in bacteria. Similarities in sequence and function indicate that EF-1 and EF-Tu are members of a family of proteins that share a common ancestor. The members of protein families are called homologous proteins, or homologs. The concept of a homolog can be further refined. If two proteins within a family (that is, two homologs) are present in the same species, they are referred to as paralogs. Homologs from different species are called orthologs (see Fig. 1–37). The process of tracing evolution involves first identifying suitable families of homologous proteins and then using them to reconstruct evolutionary paths. Homologs are identified using increasingly power ful computer programs that can directly compare two or more chosen protein sequences, or can search vast databases to find the evolutionary relatives of one selected protein sequence. The electronic search process can be thought of as sliding one sequence past the other until a section with a good match is found. Within this sequence alignment, a positive score is assigned for each position where the amino acid residues in the two sequences are identical—the value of the score varying from one program to the next—to provide a measure of the quality of the alignment. The process has some com plications. Sometimes the proteins being compared match well at, say, two sequence segments, and these segments are connected by less related sequences of different lengths. Thus the two matching segments cannot be aligned at the same time. To handle this, the computer program introduces “gaps” in one of the sequences to bring the matching segments into register (Fig. 3–30).
IGURE 3–30 Aligning protein sequences with the use of gaps. Shown here is the sequence alignment of a short section of the EF-Tu protein from two well-studied bacterial species, E. coli and Bacillus subtilis. Introduction of a gap in the B. subtilis sequence allows a better alignment of amino acid residues on either side of the gap. Iden tical amino acid residues are shaded.
Of course, if a sufficient number of gaps are introduced, almost any two sequences could be brought into some sort of alignment. To avoid uninformative alignments, the programs include penalties for each gap introduced, thus lowering the overall alignment score. With electronic trial and error, the program selects the alignment with the optimal score that maximizes identical amino acid residues while minimizing the introduction of gaps. Identical amino acids are often inadequate to identify related proteins or, more importantly, to determine how closely related the proteins are on an evolutionary time scale. A more useful analysis includes a consider ation of the chemical properties of substituted amino acids. When amino acid substitutions are found within a protein family, many of the differences may be conservative—that is, an amino acid residue is replaced by a residue having similar chemical properties. For ex ample, a Glu residue may substitute in one family member for the Asp residue found in another; both amino acids are negatively charged. Such a conservative substitution should logically garner a higher score in a sequence alignment than does a nonconservative substitution, such as the replacement of the Asp residue with a hydrophobic Phe residue.
To determine what scores to assign to the many different amino acid substitutions, Steven Henikoff and Jorja Henikoff examined the aligned sequences from a variety of different proteins. They did not analyze en tire protein sequences, focusing instead on thousands of short conserved blocks where the fraction of identical amino acids was high and the alignments were thus reliable. Looking at the aligned sequence blocks, the Henikoffs analyzed the nonidentical amino acid residues within the blocks. Higher scores were given to non-identical residues that occurred frequently than to those that appeared rarely. Even the identical residues were given scores based on how often they were replaced, such that amino acids with unique chemical properties (such as Cys and Trp) received higher scores than those more conservatively replaced (such as Asp and Glu). The result of this scoring system is a Blosum (blocks substitution matrix) table. The table in Figure 3–31 was generated from sequences that were identical in at least 62% of their amino acid residues, and it is thus referred to as Blosum62. Similar tables have been generated for blocks of homologous sequences that are 50% or 80% identical. When higher levels of identity are required, the most conservative amino acid substitutions can be overrepresented, which limits the usefulness of the matrix in identifying homologs that are somewhat distantly related. Tests have shown that the Blosum62 table pro vides the most reliable alignments over a wide range of protein families, and it is the default table in many sequence alignment programs. For most efforts to find homologies and explore evolutionary relationships, protein sequences (derived ei ther directly from protein sequencing or from the sequencing of the DNA encoding the protein) are superior to nongenic nucleic acid sequences (those that do not encode a protein or functional RNA). For a nucleic acid, with its four different types of residues, random align ment of nonhomologous sequences will generally yield matches for at least 25% of the positions. Introduction of a few gaps can often increase the fraction of matched residues to 40% or more, and the probability of chance alignment of unrelated sequences becomes quite high. The 20 different amino acid residues in proteins greatly lower the probability of uninformative chance alignments of this type. The programs used to generate a sequence align ment are complemented by methods that test the reliability of the alignments. A common computerized test is to shuffle the amino acid sequence of one of the proteins being compared to produce a random sequence, then instruct the program to align the shuffled sequence with the other, unshuffled one. Scores are assigned to the new alignment, and the shuffling and alignment process is repeated many times. The original alignment, before shuffling, should have a score significantly higher than any of those within the distribution of scores generated by the random alignments; this increases the confidence that the sequence alignment has identified a pair of homologs. Note that the absence of a significant align ment score does not necessarily mean that no evolutionary relationship exists between two proteins. As we shall see in Chapter 4, three-dimensional structural similarities sometimes reveal evolutionary relationships where sequence homology has been wiped away by time. Using a protein family to explore evolution requires the identification of family members with similar molecular functions in the widest possible range of organisms. Information from the family can then be used to trace the evolution of those organisms. By analyzing the sequence divergence in selected protein families, in vestigators can segregate organisms into classes based on their evolutionary relationships. This information must be reconciled with more classical examinations of the physiology and biochemistry of the organisms. Certain segments of a protein sequence may be found in the organisms of one taxonomic group but not in other groups; these segments can be used as signa ture sequences for the group in which they are found. An example of a signature sequence is an insertion of 12 amino acids near the amino terminus of the EF 1 /EF-Tu proteins in all archaebacteria and eukaryotes but not in other types of bacteria (Fig. 3–32). The sig nature is one of many biochemical clues that can help establish the evolutionary relatedness of eukaryotes and archaebacteria. For example, the major taxa of bacteria can be distinguished by signature sequences in several different proteins. The and proteobacteria have sig nature sequences in the Hsp70 and DNA gyrase protein families (families of proteins involved in protein folding and DNA replication, respectively) that are not present in any other bacteria, including the other proteobacteria. The other types of proteobacteria (α, δ, ε), along with the and proteobacteria, have a separate Hsp70 signature sequence and a signature in alanyl-tRNA syn the tase (an enzyme of protein synthesis) that are not present in other bacteria. The appearance of unique sig natures in the β and γ proteobacteria suggests the α, δ and ε proteobacteria arose before their β and γ cousins. By considering the entire sequence of a protein, re searchers can now construct more elaborate evolutionary trees with many species in each taxonomic group. Figure 3–33 presents one such tree for bacteria, based on sequence divergence in the protein GroEL (a protein present in all bacteria that assists in the proper folding of proteins). The tree can be refined by basing it on the sequences of multiple proteins and by supplementing the sequence information with data on the unique biochemical and physiological properties of each species. There are many methods for generating trees, each with its own advantages and shortcomings, and many ways to represent the resulting evolutionary rela tionships. In Figure 3–33, the free end points of lines are called “external nodes”; each represents an extant species, and each is so labeled. The points where two lines come together, the “internal nodes,” represent ex tinct ancestor species. In most representations (includ ing Fig. 3–33), the lengths of the lines connecting the nodes are proportional to the number of amino acid substitutions separating one species from another. If we trace two extant species to a common internal node (representing the common ancestor of the two species), the length of the branch connecting each external node to the internal node represents the number of amino acid substitutions separating one extant species from this ancestor. The sum of the lengths of all the line seg ments that connect an extant species to another extant species through a common ancestor reflects the num ber of substitutions separating the two extant species. To determine how much time was needed for the various species to diverge, the tree must be calibrated by comparing it with information from the fossil record and other sources. As more sequence information is made available in databases, we can generate evolutionary trees based on a variety of different proteins. Some proteins evolve faster than others, or change faster within one group of species than another. A large protein, with many variable amino acid residues, may exhibit a few differences between two closely related species. Another, smaller protein may be identical in the same two species. For many reasons, some details of an evolutionary tree based on the sequences of one protein may differ from those of a tree based on the sequences of another protein. Increasingly sophisticated analyses using the sequences of many different proteins can provide an exquisitely detailed and accurate picture of evolutionary relationships. The story is a work in progress, and the questions being asked and answered are fundamental to how humans view themselves and the world around them. The field of molecular evolution promises to be among the most vibrant of the scientific frontiers in the twenty-first century.
FIGURE 3–31 The Blosum62 table. This blocks substitution matrix was created by comparing thousands of short blocks of aligned sequences that were identical in at least 62% of their amino acid residues. The nonidentical residues were assigned scores based on how frequently they were replaced by each of the other amino acids. Each substitution contributes to the score given to a particular align ment. Positive numbers (shaded yellow) add to the score for a particular alignment; negative numbers subtract from the score. Identical residues in sequences being compared (the shaded diagonal from top left to bottom right in the matrix) receive scores based on how often they are replaced, such that amino acids with unique chemical properties (e.g., Cys and Trp) receive higher scores (9 and 11, respectively) than those more easily replaced in conservative substitutions (e.g., Asp (6) and Glu (5)). Many computer programs use Blosum62 to assign scores to new sequence alignments.
FIGURE 3–32 A signature sequence in the EF-1 /EF-Tu protein family. The signature sequence (boxed) is a 12-amino-acid insertion near the amino terminus of the sequence. Residues that align in all species are shaded yellow. Both archaebacteria and eukaryotes have the signature, although the sequences of the insertions are quite distinct for the two groups. The variation in the signature sequence reflects the significant evolutionary divergence that has occurred at this site since it first appeared in a common ancestor of both groups.
FIGURE 3–33 Evolutionary tree derived from amino acid sequence comparisons. A bacterial evolutionary tree, based on the sequence divergence observed in the GroEL family of proteins. Also included in this tree (lower right) are the chloroplasts (chl.) of some nonbacterial species.
الاكثر قراءة في مواضيع عامة في الكيمياء الحياتية
اخر الاخبار
اخبار العتبة العباسية المقدسة
الآخبار الصحية

قسم الشؤون الفكرية يصدر كتاباً يوثق تاريخ السدانة في العتبة العباسية المقدسة
"المهمة".. إصدار قصصي يوثّق القصص الفائزة في مسابقة فتوى الدفاع المقدسة للقصة القصيرة
(نوافذ).. إصدار أدبي يوثق القصص الفائزة في مسابقة الإمام العسكري (عليه السلام)