For most vertebrate genes, almost all polypeptide-encoding genes, and many RNA genes only a small portion of the gene sequence is eventually decoded to give the final product. In these cases the genetic instructions for making an mRNA or mature noncoding RNA occur in exon segments that are separated by intervening intron sequences that do not contribute genetic information to the final product.
Transcription of a gene initially produces a primary RNA transcript that is complementary to the entire length of the gene, including both exons and introns. This primary transcript then undergoes RNA splicing, whereby the intronic RNA segments are removed and discarded while the remaining exonic RNA segments are joined end-to end, to give a shorter RNA product (Figure 1).

Fig1. RNA splicing brings transcribed exon sequences together. Most of our protein-coding genes (and many RNA genes) undergo RNA splicing. In this generalized example, a protein-coding gene is illustrated with an upstream promoter and three exons separated by two introns that each begin with the dinucleotide GT and end in the dinucleotide AG. The central exon (exon 2) is composed entirely of coding DNA, but exons 1 and 3 have noncoding DNA sequences (that will eventually be used to make untranslated sequences in the mRNA). The three exons and the two separating introns are transcribed together to give a large primary RNA transcript. The RNA transcript is cleaved at positions corresponding to exon–intron boundaries. The two transcribed intron sequences that are excised are each degraded, but the transcribed exon sequences are joined (spliced) together to form a contiguous mature RNA that has noncoding sequences at both the 5′ and 3′ ends. In the mature mRNA these terminal sequences will not be translated and so are known as untranslated regions (UTRs). The central coding sequence of the mRNA is defined by a translation start site (which is almost always the trinucleotide AUG) and a translation stop site, and is read (translated) to produce a polypeptide. N, N-terminus; C, C-terminus.
RNA splicing requires recognition of the nucleotide sequences at the boundaries of transcribed exons and introns (splice junctions). The dinucleotides at the ends of introns are highly conserved: the vast majority of introns start with a GT (becoming GU in intronic RNA) and end with an AG (the GT–AG rule).
Although the conserved GT and AG dinucleotides are crucial for splicing, they are not sufficient to mark the limits of an intron. The nucleotide sequences that are immediately adjacent to them are also quite highly conserved, constituting splice junction consensus sequences (Figure 2). A third conserved intronic sequence that is also important in splicing is known as the branch site and is typically located no more than 40 nucleotides upstream of the intron’s 3′-terminal AG (see Figure 2). Other exonic and intronic sequences can promote splicing (splice enhancer sequences) or inhibit it (splice silencer sequences), and mutations in these sequences can cause disease.

Fig2. Three consensus DNA sequences in introns of complex eukaryotes. Most introns in eukaryotic genes contain conserved sequences that correspond to three functionally important regions. Two of the regions, the splice donor site and the splice acceptor site, span the 5′ and 3′ boundaries of the intron. The branch site is an additional important region that typically occurs less than 20 nucleotides upstream of the splice acceptor site. The nucleotides shown in red in these three consensus sequences are almost invariant. The other nucleotides detailed in both the intron and the exons are those most commonly found at each position. In some instances, two nucleotides may be equally common, as in the case of C and T near the 3′ end of the intron. Where N appears, any of the four nucleotides may occur.
As illustrated in Figure 3, the essential steps in splicing are as follows:
1. Nucleophilic attack of the intron’s 5′ terminal G nucleotide by the invariant A of the branch-site consensus sequence, to form a lariat-shaped structure;
2. Cleavage of the exon–intron junction at the splice donor site;
3. Nucleophilic attack by the 3′ end of the upstream exon of the splice acceptor site, leading to cleavage and release of the intronic RNA in the form of a lariat, and the splicing together of the two exonic RNA segments.

Fig3. The mechanism of RNA splicing. (A) The unprocessed primary RNA transcript with intronic RNA separating sequences E1 and E2 that correspond to exons in DNA. The splicing mechanism involves a nucleophilic attack on the G of the 5′ GU dinucleotide. This is carried out by the 2′ hydroxyl (OH) group on the conserved A of the branch site and results in (B) formation of a lariat structure, and cleavage of the splice donor site. The 3′ OH at the 3′ end of the E1 sequence carries out a nucleophilic attack on the splice acceptor site causing release of the intronic RNA (as a lariat-shaped structure) and (C) fusion (splicing) of E1 and E2.
In the case of genes residing in eukaryotic nuclei, RNA splicing is mediated by a large RNA–protein complex called the spliceosome. Spliceosomes have five types of snRNA and more than 50 proteins. The snRNA molecules associate with proteins to form small nuclear ribonucleoprotein (snRNP, or “snurp”) particles. The specificity of the splicing reaction is established by RNA–RNA base pairing between the RNA transcript to be spliced and snRNA molecules within the spliceosome. There are two types of spliceosome:
• The major (GU-AG) spliceosome processes transcripts that have classical GU-AG introns. It contains five types of snRNA: U1 and U2 snRNAs recognize and bind the splice donor and branch sites, respectively; U4, U5, and U6 snRNAs subsequently bind to cause looping out of the intronic RNA (Figure4);
• The minor (AU-AC) spliceosome processes transcripts that have rare AU-AC introns. It also has five snRNAs but uses U11 and U12 snRNA instead of U1 and U2 and has variants of U4 and U6 snRNA.

Fig4. Role of small nuclear ribonucleoproteins (snRNPs) in RNA splicing. (A) The unprocessed primary RNA transcript as in Figure 1.20. (B) Within the spliceosome, part of the U1 snRNA is complementary in sequence to the splice donor-site consensus sequence, and binds to it by RNA–RNA base pairing. The U2 snRNA similarly binds to the branch site. Interaction between the splice donor and splice acceptor sites is stabilized by (C) the binding of a multi snRNP particle that contains the U4, U5, and U6 snRNAs. The U5 snRNP binds simultaneously to both the splice donor and splice acceptor sites. Their cleavage releases the intronic sequence and allows (D) E1 and E2 to be spliced together.
Once a splice donor site is recognized by the spliceosome, it scans the RNA sequence until it meets the next splice acceptor site (signaled as a target by the upsteam presence of the branch-site consensus sequence).