Gene expression and transcription

Last updated: August 15, 2022

Summarytoggle arrow icon

The genome contains the hereditary information of the structure and function of a cell or organism. This information is stored as a sequence of bases in DNA. A relatively small percentage of DNA codes for proteins and ribonucleic acids (RNAs), while a large amount of the genome is composed of sequences without a clear function. The conversion of the information stored within DNA into a functional molecule, or RNA and proteins, is termed gene expression. Gene expression occurs in two stages: transcription and translation. During transcription, DNA is copied into RNA. RNA is then used to synthesize proteins during translation.

Key enzymes involved in transcription are DNA-dependent RNA polymerases. These enzymes synthesize the RNA molecule based on the genes encoded in DNA, which contain starting sites (promoters) where transcription begins. Transcription factors are required to recognize the promoter. RNA polymerase moves along the template strand of the double-stranded DNA. The strand is synthesized until the end of the DNA segment (termination site) is reached. In eukaryotes, the newly formed primary transcript is further modified to be, for example, available for protein synthesis.

Gene expression is strongly regulated at all levels. Some genes are expressed in all cells and are required as housekeeping genes for basic cellular functions (i.e., constitutive expression). Other genes are only active in certain cells; their expression is regulated by a variety of mechanisms. Genes can undergo activation or silencing, and transcription depends on the presence of specific DNA-binding proteins. The newly formed RNA may also be degraded after transcription by various mechanisms before use in protein synthesis. There are also regulatory mechanisms at a translational level. Although each cell in an organism contains the same DNA, the regulated expression of certain genes causes the cells to specialize and assume different functions, e.g., muscle cells or hepatocytes.

Overviewtoggle arrow icon

In protein synthesis, DNA is initially transcribed into mRNA (transcription) and mRNA is translated into an amino acid chain (translation).

Transcriptiontoggle arrow icon

In transcription, DNA serves as a template to produce a complementary RNA molecule. Only a single-strand from the double-stranded DNA (dsDNA) is read.

Introns are intervening introverts”: Introns are found between (lat. “inter”) protein-coding DNA sequences and stay in the nucleus.

Exons are expressive extroverts”: Exons contain protein-coding DNA sequences that will be expressed and exit the nucleus.

RNA polymerases and transcription factorstoggle arrow icon

RNA polymerases

Transcription reactions are catalyzed by (DNA-dependent) RNA polymerases. In eukaryotic cells, there are various types of RNA polymerase, which recognize different promoter types and transcribe different types of genes. In prokaryotes, on the other hand, there is only one type of RNA polymerase that transcribes all three types of RNA.

Overview of RNA polymerases
Type of RNA polymerase Transcripts Location

RNA polymerase I

(most common type)


RNA polymerase II

RNA polymerase III

Mitochondrial RNA polymerase


RNA polymerase II transcribes almost all genes that code for proteins.

The RNA polymerases are numbered in the order in which their products are utilized in the process of protein synthesis! I, II, and III → rRNA, mRNA, and tRNA, respectively.

In prokaryotes, there is only one type of RNA polymerase that transcribes all three types of RNA.

Transcription factors

RNA polymerases require helper proteins for promoter recognition of the genes to be transcribed.

DNA-binding proteins

Proteins, such as transcription factors that bind to DNA, require specific protein domains, also termed structural motifs. These structural motifs usually use either an α-helix or a β sheet to bind to the major groove of DNA. Transcription factors have DNA-binding domains through which they are able to interact with specific DNA segments to perform their function. Numerous structural motifs of DNA-binding domains have been identified. Important examples are the zinc finger domains, leucine zippers, basic helix-loop-helix, and the homeobox.

  • Zinc finger
  • Leucine zipper
  • Basic helix-loop-helix
  • Homeobox (with helix-turn-helix)
    • Characteristics: a polypeptide chain with three short, successive α-helices, with the third α-helix perpendicular to the first two α-helices through a turn
    • DNA binding: The third, relatively basic α-helix binds as a recognition helix, especially to exposed bases in the major groove of DNA.

An important structural motif of DNA-binding proteins is an α-helix with many basic amino acid residues.

Stages of transcriptiontoggle arrow icon

Transcription is divided into three phases: initiation, elongation, termination.

  1. Initiation (transcription): the start of transcription by the formation of the initiation complex and unwinding of DNA
    1. Preinitiation complex (RNA polymerase-promoter closed complex) formation by binding of general transcription factors and RNA polymerase to the promoter region (e.g., TATA box, CAAT box, GC box)
    2. Formation of a transcription bubble by unwinding the DNA double helix to a single strand with a length of 10–12 bases (open complex)
    3. Start of RNA synthesis
  2. Elongation
    • Extension of the RNA strand
    • 3′OH group of the growing RNA strand is attached to the α-phosphate group of the next complementary nucleoside triphosphate
  3. Termination: During termination, polyadenylation starts.

During transcription, base pairing occurs between DNA and RNA. Uracil (instead of thymine) in RNA pairs with adenine in DNA.

RNA and DNA pair in an antiparallel direction. The 5′ end of one strand is the 3′ end of the other strand and vice versa. In both cases, the base sequences are written in the usual 5′ → 3′ direction.

Post-transcriptional modification (RNA processing)toggle arrow icon

In eukaryotes, the end-product of transcription is heterogeneous nuclear RNA (hnRNA), which is then transformed into mature mRNA through posttranscriptional modifications in the nucleus. These modifications include capping, polyadenylation, splicing, and RNA editing. mRNA then leaves the nucleus and enters the cytosol.


  • Definition: addition of a cap of 7-methylguanosine to the 5 end of hnRNA to form the five-prime cap
  • Process
    1. Cleavage of the 5′-phosphate group by RNA triphosphatase
    2. Addition of a GMP residue (formed from GTP with cleavage of pyrophosphate) to the 5′ diphosphate end of hnRNA by guanylyltransferase
    3. Methylation of one, two, or three ribosome residues of hnRNA with S-adenosylmethionine (SAM) as a methyl group donor
  • Function
    • Protects against degradation (through exonucleases )
    • Initiation of translation




  • Definition: excision of introns from hnRNA transcripts and direct linkage of exons
  • Function: excision of introns so that the resulting mature mRNA only contains relevant information in the form of exons


  1. Spliceosome formation at the exon-intron border
  2. Opening of the exon-intron border at the 5′ splice site: A temporary lariat structure with a 2′ → 5′ phosphodiester bond is formed, which links the two ends to be joined together in proximity (loop formation)
  3. Opening of the exon-intron border at the 3′ splice site
  4. Joining of the exon ends

The exons of a gene are the coding segments; the introns are removed from hnRNA by splicing.

RNA editing

Alternative splicing

  • Definition: removal of introns within hnRNA with differential joining of exons
  • Process: similar to splicing with additional splicing factors that determine the range of splice locations
  • Function
    • Various proteins can be produced from a single hnRNA sequence, which allows for increased information density of DNA
    • The formation of new proteins is facilitated: more rapid adaptation to altered living conditions
  • Examples

The one gene-one enzyme hypothesis does not apply to eukaryotes. A variety of proteins can be formed from one gene by alternative splicing.

Quality control of mRNA

Regulation of transcriptiontoggle arrow icon

Because transcription and protein synthesis require large amounts of energy, gene expression is strongly regulated. While some genes are continuously transcribed, other genes undergo regulation.

Prokaryotic gene regulation (operon model)

Regulation of gene expression was initially analyzed in E. coli. Regulatory sequences in the bacterial genome ensure gene expression of the enzyme β-galactosidase if the sugar lactose is available as an energy source. Other proteins are also synthesized, which are associated with lactose metabolism. Therefore, it involves the coordinated expression of several genes.

In the lac operon, the repressor binds to the operator and prevents transcription of the operon gene in the absence of lactose.

Eukaryotic gene regulation

Regulation of gene expression is significantly complicated in eukaryotes compared to prokaryotes. One reason is due to the difference in size between the genomes of eukaryotes and prokaryotes, with eukaryotes having a significantly larger genome. Another reason is that the DNA in the eukaryotic genome in the nucleus is strongly condensed and packaged as chromatin. As a result, it is less accessible than prokaryotic DNA. However, a common feature of eukaryotes and prokaryotes is the importance of activators and repressors, which bind specific DNA sequences and increase or inhibit gene expression.

Transcriptional inhibitors

Transcriptional inhibitors are strong cytotoxins but can also be partially used as an antibiotic.

Icon of a lock3 free articles remaining

You have 3 free member-only articles left this month. Sign up and get unlimited access.
 Evidence-based content, created and peer-reviewed by physicians. Read the disclaimer