- Overview: The Two-Component System
- Cas9 Protein Architecture
- The Guide RNA: crRNA and tracrRNA
- PAM Recognition: How Cas9 Finds Its Starting Point
- R-Loop Formation: Checking the Match
- The Cut: HNH and RuvC in Action
- The Seed Region: Why Mismatches Near the PAM Matter Most
- Cas9 Variants: Different Bacteria, Different Rules
- Engineered Cas9: Nickases, dCas9, and Beyond
- Putting It All Together: The Complete Cycle
Section 1 — Overview: The Two-Component System
Before diving into molecular detail, it helps to have the big picture clearly in mind. The CRISPR-Cas9 editing system has exactly two components that you deliver to a cell. Everything else — every conformational change, every nucleotide check, every phosphodiester bond broken — happens automatically as a consequence of the physics and chemistry of these two molecules finding each other and finding their target.
Component one is the Cas9 protein — a large, multi-domain enzyme (~160 kilodaltons, about 1,368 amino acids in the most commonly used variant from Streptococcus pyogenes) that does the actual DNA cutting. On its own, Cas9 is essentially inactive. It binds DNA non-specifically and weakly. It cannot cut anything.
Component two is the guide RNA — a short RNA molecule (~100 nucleotides total) that tells Cas9 exactly where to cut. It has two functional regions: a 20-nucleotide spacer sequence that base-pairs with the target DNA, and a structural scaffold region that binds to and activates Cas9. When guide RNA binds Cas9, the complex undergoes a dramatic conformational change that activates the protein and arms it for cutting.
- 1,368 amino acids (SpCas9)
- ~160 kDa molecular weight
- Two nuclease domains: HNH + RuvC
- PAM-interacting (PI) domain
- Inactive alone; activated by gRNA
- ~100 nucleotides total
- 20 nt spacer: targets the DNA
- 80 nt scaffold: binds and activates Cas9
- Engineered fusion of crRNA + tracrRNA
- Changed to redirect cutting anywhere
Section 2 — Cas9 Protein Architecture: A Tour of the Domains
Understanding Cas9’s mechanism requires understanding its physical structure. Cas9 is not a simple, featureless enzyme. It is a multi-domain protein that changes its three-dimensional shape at each stage of the cutting process. The different domains have different jobs, and mutations in any one of them produce a protein with different capabilities — which is exactly how researchers engineer Cas9 variants.
Structurally, Cas9 can be divided into two main lobes that form a bilobed architecture resembling a pair of cupped hands or a crab claw. The recognition lobe (REC lobe) and the nuclease lobe (NUC lobe) are connected by an arginine-rich bridge helix and create a central channel through which the guide RNA-DNA heteroduplex (the R-loop) is threaded during the cutting reaction.
| Domain | Residues | Function |
|---|---|---|
| REC1 | 94–179, 308–713 | Binds the repeat:anti-repeat scaffold of the guide RNA; essential for Cas9-gRNA complex formation |
| REC2 | 180–307 | Role less clear; may stabilise the bilobed architecture. Deletions in this region can reduce protein size without abolishing activity |
| Bridge Helix | 60–93 | Arginine-rich helix that contacts the RNA-DNA heteroduplex at the 3’ end of the spacer (near PAM). Critical for triggering HNH activation |
| HNH | 775–908 | Cuts the complementary strand (same strand as the spacer sequence in the guide RNA). Contains the catalytic His840 residue |
| RuvC | 1–59, 718–769, 909–1098 | Cuts the non-complementary strand (displaced strand). Split across three segments; contains catalytic Asp10 and His983 |
| PAM-Interacting (PI) | 1099–1368 | Recognises the PAM sequence in the major groove of DNA. Determines which PAM sequences Cas9 will accept. Swapping this domain changes PAM specificity |
Section 3 — The Guide RNA: From Two Molecules to One
In the natural CRISPR immune system of Streptococcus pyogenes, Cas9 requires two separate RNA molecules to function. The crRNA (CRISPR RNA) contains the spacer sequence that matches the target DNA. The tracrRNA (trans-activating crRNA) base-pairs with a repeat region in the crRNA to form a duplex structure that binds and activates Cas9. Both are required. Neither works alone.
The Doudna-Charpentier lab’s key engineering breakthrough was showing that the crRNA and tracrRNA could be fused into a single RNA molecule via a short loop sequence — creating what they called the single guide RNA (sgRNA). This fusion dramatically simplified the system: instead of separately synthesising and delivering two RNA molecules, users needed only one. The sgRNA retains all the functionality of the two-piece system while being much easier to design, produce, and deliver.
The 20 Ns you design. Base-pairs with target DNA. Change these 20 nucleotides to retarget CRISPR to any sequence in any genome.
Fixed sequence. Forms stem-loop structures that bind Cas9 and hold the complex together. Do not change this region — it will break the system.
Section 4 — PAM Recognition: How Cas9 Finds Its Starting Point
The genome is enormous. How does Cas9 efficiently search 3.2 billion base pairs for its target without spending an eternity at every position? The answer is the PAM sequence (Protospacer Adjacent Motif) — a short DNA sequence that Cas9 checks first before investing energy in the full guide RNA matching process.
For SpCas9 (the most widely used variant), the PAM is 5’-NGG-3’ on the non-template strand, located immediately 3’ (downstream) of the protospacer (the genomic DNA sequence that matches the guide RNA spacer). NGG means any nucleotide followed by two guanines. This sequence occurs approximately once every 8 base pairs in a random genome — frequently enough that almost any gene can be targeted, but infrequently enough to reduce the search space significantly.
The Mechanism of PAM Recognition
Cas9 scans DNA by sliding along in a 3D diffusion process — sometimes called facilitated diffusion. At each position, it briefly interrogates the DNA sequence. When it encounters an NGG, it slows down and opens the DNA duplex locally to allow the guide RNA spacer to attempt base-pairing with the exposed strand. If the PAM is wrong (not NGG), Cas9 moves on without interrogating further.
The PAM is recognised by the PAM-interacting (PI) domain of Cas9, which reads the major groove of the DNA. The two guanines of the NGG PAM make specific contacts with Arg1333 and Arg1335 residues in the PI domain. These arginine-guanine interactions are what make NGG the required PAM for SpCas9. Mutating Arg1333 or Arg1335 destroys PAM recognition and abolishes Cas9 activity.
The 20 nt protospacer (purple) matches your guide RNA spacer. The NGG PAM (yellow) is immediately 3’ of it on the non-template strand. Cas9 cuts between positions 17 and 18 of the protospacer (3 bp upstream of the PAM), leaving a blunt-ended double-strand break.
Section 5 — R-Loop Formation: Checking the Match
Once Cas9 finds a PAM and opens the DNA locally, the guide RNA spacer must check whether the adjacent 20-nucleotide sequence actually matches the intended target. This matching process is called R-loop formation (the R stands for RNA), and it is the central specificity-checking step of the entire mechanism.
The R-loop forms as the guide RNA spacer invades the DNA double helix and displaces the non-complementary DNA strand. The 20 nucleotides of the spacer base-pair with the complementary DNA strand one by one, starting from the PAM-proximal end (nucleotides 1–3) and zipping toward the PAM-distal end (nucleotides 18–20). The displaced non-complementary DNA strand forms the “loop” of the R-loop structure.
If the spacer sequence perfectly matches the DNA, the R-loop forms stably and extends through all 20 nucleotides. This full R-loop formation is the trigger that activates Cas9 for cutting. If there are mismatches — base pairs that cannot form — the R-loop is destabilised and may fail to extend fully, leaving Cas9 inactive at that position. The stringency of this check determines Cas9 specificity.
The Directionality of R-Loop Formation
Section 6 — The Cut: HNH and RuvC in Action
The actual DNA cleavage event is carried out by Cas9’s two nuclease domains working in concert. Each domain cuts one strand of the DNA double helix, together producing a double-strand break (DSB) — a complete severance of the DNA molecule at a single defined position.
The HNH domain cuts the complementary strand — the strand that has base-paired with the guide RNA. The catalytic residue is Histidine 840 (H840). HNH is a magnesium-dependent endonuclease: it requires a Mg²⁺ ion in its active site to activate a water molecule for nucleophilic attack on the DNA phosphodiester backbone. The cut leaves a 3’-hydroxyl and a 5’-phosphate group.
The RuvC domain cuts the non-complementary (displaced) strand — the strand that was pushed out during R-loop formation. RuvC is also Mg²⁺ dependent and uses a two-metal-ion mechanism similar to many other nucleases. Its catalytic residues include Aspartate 10 (D10) and Histidine 983 (H983). The cut is made at the same position as the HNH cut: 3 base pairs upstream of the PAM sequence.
Both cuts occur between position 17 and 18 of the protospacer (counting from position 1 nearest to the PAM). The result is a blunt-ended double-strand break — both cuts at the same position, no overhang. This is the canonical SpCas9 cut. Some other Cas9 orthologs generate staggered cuts with short overhangs.
Section 7 — The Seed Region: Why Location of Mismatches Matters
Not all positions in the 20-nucleotide spacer are equal. Mismatches between the guide RNA and target DNA are much better tolerated at some positions than others. The most important concept here is the seed region: the ~12 nucleotides immediately adjacent to the PAM (positions 1–12, counting from the PAM-proximal end).
Mismatches within the seed region almost always abolish or severely reduce Cas9 cutting. This is because R-loop formation initiates at the PAM-proximal end and zips toward the PAM-distal end: if early base pairs cannot form (seed region mismatches), the R-loop stalls and cannot complete, so Cas9 never reaches the active conformation. The seed region is therefore the primary specificity checkpoint.
Mismatches in the PAM-distal region (positions 13–20) are much better tolerated. The R-loop can sometimes extend over one or even two mismatches in this region, allowing Cas9 to cut even at imperfectly matched sites. This is the primary source of CRISPR off-target effects: sites with perfect seed region matching but one or two PAM-distal mismatches can still be cut at significant frequency.
Practical implication: when designing guide RNAs, pay special attention to seed region matches. A perfect match in nt 1–12 with mismatches in nt 13–20 is more dangerous (higher off-target risk) than mismatches distributed evenly. Off-target prediction algorithms weight seed region matches heavily for exactly this reason.
Section 8 — Cas9 Variants: Different Bacteria, Different Rules
SpCas9 from Streptococcus pyogenes was the first and is still the most widely used Cas9. But it is not the only Cas9 in nature. Dozens of bacterial species have their own CRISPR-Cas9 systems, each with different protein sizes, PAM requirements, and cutting characteristics. Understanding the alternatives matters for two reasons: different applications call for different tools, and the limitations of SpCas9 (particularly its large size and NGG-only PAM) have driven the development of alternatives.
| Protein | Source Organism | Size | PAM | Key Advantage |
|---|---|---|---|---|
| SpCas9 | S. pyogenes | 1,368 aa | NGG | Most studied, highest activity; the default choice |
| SaCas9 | S. aureus | 1,053 aa | NNGRRT | 25% smaller than SpCas9 — fits in single AAV vector with gRNA |
| CjCas9 | C. jejuni | 984 aa | NNNNRYAC | Smallest natural Cas9; useful for size-limited delivery |
| SpRY | SpCas9 engineered | 1,368 aa | NRN / NYN | Near-PAMless — can target almost any sequence; enables base editing anywhere |
| Cas12a (Cpf1) | Acidaminococcus | 1,307 aa | TTTV | T-rich PAM; staggered cut; processes its own crRNA array |
Section 9 — Engineered Cas9: Nickases, dCas9, and What They Enable
Understanding Cas9 domain structure has enabled researchers to engineer variants with fundamentally different capabilities. By mutating the catalytic residues in one or both nuclease domains, you can create Cas9 proteins that nick (cut only one strand), bind without cutting, or perform entirely new chemistry. These variants have become the basis for the next generation of gene editing tools.
Cas9 Nickase (nCas9): Cutting Only One Strand
Mutating either catalytic residue (H840A to inactivate HNH, or D10A to inactivate RuvC) creates a Cas9 nickase — a protein that cuts only one strand of the DNA. A nick (single-strand break) is repaired with high fidelity by the cell, generally without introducing indels. By using two nickases with guide RNAs targeting opposite strands several base pairs apart, you can generate a double-strand break only at the intended site — dramatically reducing off-target effects, because two simultaneous nicks at off-target sites are statistically very unlikely.
Nickases are also the foundation of base editors and prime editors. Both use nCas9 (typically with D10A mutation, creating an HNH-active, RuvC-dead nickase) as their DNA-binding and nicking component. The nick is required for base editors to convert the deaminated base before the cell repairs it, and for prime editors to initiate reverse transcription of the edit template.
Dead Cas9 (dCas9): Binding Without Cutting
Mutating both catalytic residues (D10A and H840A) creates dCas9 (dead Cas9) — a protein that can find and bind any genomic target sequence with the same precision as wild-type Cas9, but cannot cut anything. dCas9 has become one of the most versatile tools in molecular biology because it functions as a programmable DNA-binding domain that can be fused to any effector protein you choose.
dCas9 alone, or fused to a transcriptional repressor (KRAB domain), blocks RNA polymerase access to the gene promoter. Gene expression is silenced without any DNA change. Completely reversible.
dCas9 fused to a transcriptional activator (VP64, VPR, SAM complex) recruits the transcription machinery to a gene promoter. Gene expression is amplified, often dramatically. No DNA editing required.
dCas9 fused to a DNA methyltransferase or histone modifier writes epigenetic marks at specific genomic locations. Silences or activates genes through chromatin modification rather than sequence change.
dCas9 fused to a fluorescent protein (GFP, mCherry) labels specific genomic loci for live-cell imaging. Tracks chromosome dynamics, visualises enhancer-promoter contacts, maps nuclear organisation.
Section 10 — Putting It All Together: The Complete Cas9 Cycle
Let’s now walk through the complete CRISPR-Cas9 mechanism from start to finish as a unified sequence of events. Every step described above fits into this cycle:
References & Further Reading
- Jinek et al. (2012) — A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science 337:816. — The original biochemical characterisation of Cas9 cutting.
- Nishimasu et al. (2015) — Crystal Structure of Cas9 in Complex with Guide RNA and Its Target DNA. Cell 156:935. — First high-resolution crystal structure of the complete Cas9-gRNA-DNA ternary complex.
- Jiang et al. (2015) — A Cas9-guide RNA complex preorganized for target DNA recognition. Science 348:1477. — Molecular basis of PAM recognition and R-loop initiation.
- Sternberg et al. (2015) — Conformational control of DNA target cleavage by CRISPR-Cas9. Nature 527:110. — Single-molecule studies showing how full R-loop formation triggers HNH activation.
- Qi et al. (2013) — Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression (CRISPRi). Cell 152:1173. — The original dCas9/CRISPRi paper.
- Ran et al. (2013) — Genome engineering using the CRISPR-Cas9 system. Nature Protocols 8:2281. — The landmark practical protocol paper; highly cited reference for experimental Cas9 use.
- Addgene CRISPR Guide — addgene.org/guides/crispr/ — Excellent practical resource for researchers, including Cas9 variant selection, sgRNA cloning, and troubleshooting.
- Cas9 is inactive until it binds its guide RNA. The gRNA induces the conformational change that creates the DNA-binding channel and arms the nuclease domains. This gating prevents random cutting.
- PAM recognition is the first specificity checkpoint. Cas9 reads NGG in the major groove before investing energy in guide RNA matching. Without a PAM, Cas9 moves on without interrogating the sequence.
- R-loop formation is the second, more stringent checkpoint. The guide RNA spacer base-pairs with the target strand from PAM-proximal to PAM-distal. Full R-loop formation triggers HNH rotation and activates cutting.
- HNH cuts the complementary strand; RuvC cuts the non-complementary strand. Both cuts occur 3 bp upstream of the PAM, producing a blunt-ended DSB. Both require Mg2+ ions and specific catalytic residues.
- The seed region (nt 1–12 from PAM) is critical for specificity. Mismatches here strongly inhibit cutting. PAM-distal mismatches (nt 13–20) are better tolerated and are the main source of off-target activity.
- Mutating catalytic residues creates powerful new tools. D10A: RuvC-dead nickase. H840A: HNH-dead nickase. D10A+H840A: dCas9 (binds but cannot cut, enabling CRISPRi, CRISPRa, epigenome editing, and imaging).
- Different Cas9 orthologs have different PAMs and sizes. SaCas9 fits in AAV. SpRY targets almost any PAM. Cas12a generates staggered cuts. Matching the Cas protein to the application is part of experimental design.
