How Cas9 Cuts DNA: The Complete Molecular Mechanism

📋 In This Article

Overview: The Two-Component System
Cas9 Protein Architecture
The Guide RNA: crRNA and tracrRNA
PAM Recognition: How Cas9 Finds Its Starting Point
R-Loop Formation: Checking the Match
The Cut: HNH and RuvC in Action
The Seed Region: Why Mismatches Near the PAM Matter Most
Cas9 Variants: Different Bacteria, Different Rules
Engineered Cas9: Nickases, dCas9, and Beyond
Putting It All Together: The Complete Cycle

Section 1 — Overview: The Two-Component System

Before diving into molecular detail, it helps to have the big picture clearly in mind. The CRISPR-Cas9 editing system has exactly two components that you deliver to a cell. Everything else — every conformational change, every nucleotide check, every phosphodiester bond broken — happens automatically as a consequence of the physics and chemistry of these two molecules finding each other and finding their target.

Component one is the Cas9 protein — a large, multi-domain enzyme (~160 kilodaltons, about 1,368 amino acids in the most commonly used variant from Streptococcus pyogenes) that does the actual DNA cutting. On its own, Cas9 is essentially inactive. It binds DNA non-specifically and weakly. It cannot cut anything.

Component two is the guide RNA — a short RNA molecule (~100 nucleotides total) that tells Cas9 exactly where to cut. It has two functional regions: a 20-nucleotide spacer sequence that base-pairs with the target DNA, and a structural scaffold region that binds to and activates Cas9. When guide RNA binds Cas9, the complex undergoes a dramatic conformational change that activates the protein and arms it for cutting.

⚡ The CRISPR-Cas9 System at a Glance

✂ Cas9 Protein

1,368 amino acids (SpCas9)
~160 kDa molecular weight
Two nuclease domains: HNH + RuvC
PAM-interacting (PI) domain
Inactive alone; activated by gRNA

🧬 Single Guide RNA (sgRNA)

~100 nucleotides total
20 nt spacer: targets the DNA
80 nt scaffold: binds and activates Cas9
Engineered fusion of crRNA + tracrRNA
Changed to redirect cutting anywhere

💡 Analogy: Cas9 as a Guided Missile

Cas9 alone is like a missile with no guidance system — it has tremendous destructive potential but no way to find its target. The guide RNA is the guidance system: it locks onto the target coordinates (the 20-nucleotide sequence in the genome) and directs the warhead (the HNH and RuvC nuclease domains) precisely to the right location. Change the guidance system, and the missile hits a completely different target. This programmability is why CRISPR is so powerful: the protein stays the same while the guide RNA is swapped to retarget.

Section 2 — Cas9 Protein Architecture: A Tour of the Domains

Understanding Cas9’s mechanism requires understanding its physical structure. Cas9 is not a simple, featureless enzyme. It is a multi-domain protein that changes its three-dimensional shape at each stage of the cutting process. The different domains have different jobs, and mutations in any one of them produce a protein with different capabilities — which is exactly how researchers engineer Cas9 variants.

Structurally, Cas9 can be divided into two main lobes that form a bilobed architecture resembling a pair of cupped hands or a crab claw. The recognition lobe (REC lobe) and the nuclease lobe (NUC lobe) are connected by an arginine-rich bridge helix and create a central channel through which the guide RNA-DNA heteroduplex (the R-loop) is threaded during the cutting reaction.

Cas9 Domain Map (SpCas9)

Domain	Residues	Function
REC1	94–179, 308–713	Binds the repeat:anti-repeat scaffold of the guide RNA; essential for Cas9-gRNA complex formation
REC2	180–307	Role less clear; may stabilise the bilobed architecture. Deletions in this region can reduce protein size without abolishing activity
Bridge Helix	60–93	Arginine-rich helix that contacts the RNA-DNA heteroduplex at the 3’ end of the spacer (near PAM). Critical for triggering HNH activation
HNH	775–908	Cuts the complementary strand (same strand as the spacer sequence in the guide RNA). Contains the catalytic His840 residue
RuvC	1–59, 718–769, 909–1098	Cuts the non-complementary strand (displaced strand). Split across three segments; contains catalytic Asp10 and His983
PAM-Interacting (PI)	1099–1368	Recognises the PAM sequence in the major groove of DNA. Determines which PAM sequences Cas9 will accept. Swapping this domain changes PAM specificity

🧬 Key Concept: The Conformational Change That Arms Cas9

When Cas9 first binds its guide RNA, it is in an open, inactive conformation. The HNH domain is positioned away from the DNA-binding channel. As the guide RNA base-pairs with target DNA and R-loop formation is completed, a dramatic conformational change occurs: HNH rotates roughly 180 degrees to position its catalytic residue directly adjacent to the scissile phosphate on the complementary DNA strand. This gating mechanism ensures Cas9 only cuts DNA when full target recognition has occurred — it is a built-in specificity checkpoint.

Section 3 — The Guide RNA: From Two Molecules to One

In the natural CRISPR immune system of Streptococcus pyogenes, Cas9 requires two separate RNA molecules to function. The crRNA (CRISPR RNA) contains the spacer sequence that matches the target DNA. The tracrRNA (trans-activating crRNA) base-pairs with a repeat region in the crRNA to form a duplex structure that binds and activates Cas9. Both are required. Neither works alone.

The Doudna-Charpentier lab’s key engineering breakthrough was showing that the crRNA and tracrRNA could be fused into a single RNA molecule via a short loop sequence — creating what they called the single guide RNA (sgRNA). This fusion dramatically simplified the system: instead of separately synthesising and delivering two RNA molecules, users needed only one. The sgRNA retains all the functionality of the two-piece system while being much easier to design, produce, and deliver.

🧬 sgRNA Structure: The 100-Nucleotide GPS

5’-NNNNNNNNNNNNNNNNNNNN-GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGCGGCGUCCUGCGGCCGAAA-3’

Purple region (nt 1–20): Spacer

The 20 Ns you design. Base-pairs with target DNA. Change these 20 nucleotides to retarget CRISPR to any sequence in any genome.

Gray region (nt 21–100): Scaffold

Fixed sequence. Forms stem-loop structures that bind Cas9 and hold the complex together. Do not change this region — it will break the system.

⚠ Common ConfusionA common mistake when designing CRISPR experiments: accidentally modifying the scaffold region rather than just the 20-nucleotide spacer. Always double-check your sgRNA design: only the first 20 nucleotides at the 5’ end should change between different targets. Everything after the spacer is conserved scaffold sequence. Most guide RNA design tools (CRISPOR, Benchling) output only the 20-nucleotide spacer and automatically attach the correct scaffold.

Section 4 — PAM Recognition: How Cas9 Finds Its Starting Point

The genome is enormous. How does Cas9 efficiently search 3.2 billion base pairs for its target without spending an eternity at every position? The answer is the PAM sequence (Protospacer Adjacent Motif) — a short DNA sequence that Cas9 checks first before investing energy in the full guide RNA matching process.

For SpCas9 (the most widely used variant), the PAM is 5’-NGG-3’ on the non-template strand, located immediately 3’ (downstream) of the protospacer (the genomic DNA sequence that matches the guide RNA spacer). NGG means any nucleotide followed by two guanines. This sequence occurs approximately once every 8 base pairs in a random genome — frequently enough that almost any gene can be targeted, but infrequently enough to reduce the search space significantly.

The Mechanism of PAM Recognition

Cas9 scans DNA by sliding along in a 3D diffusion process — sometimes called facilitated diffusion. At each position, it briefly interrogates the DNA sequence. When it encounters an NGG, it slows down and opens the DNA duplex locally to allow the guide RNA spacer to attempt base-pairing with the exposed strand. If the PAM is wrong (not NGG), Cas9 moves on without interrogating further.

The PAM is recognised by the PAM-interacting (PI) domain of Cas9, which reads the major groove of the DNA. The two guanines of the NGG PAM make specific contacts with Arg1333 and Arg1335 residues in the PI domain. These arginine-guanine interactions are what make NGG the required PAM for SpCas9. Mutating Arg1333 or Arg1335 destroys PAM recognition and abolishes Cas9 activity.

🎯 PAM Position: Where to Find It

5’- NNNNNNNNNNNNNNNNNNNN NGG -3’  (non-template / non-complementary strand)
3’- NNNNNNNNNNNNNNNNNNN N  NCC -5’  (template / complementary strand)
    |←———— 20 nt protospacer ————→| PAM

The 20 nt protospacer (purple) matches your guide RNA spacer. The NGG PAM (yellow) is immediately 3’ of it on the non-template strand. Cas9 cuts between positions 17 and 18 of the protospacer (3 bp upstream of the PAM), leaving a blunt-ended double-strand break.

🧬 Key Concept: Why PAM Exists (And Why It Matters for Off-Target Effects)

The PAM serves a critical biological purpose in bacteria: it prevents Cas9 from cutting the CRISPR array in its own genome. The spacer sequences stored in the CRISPR array are flanked by repeat sequences, not NGG, so Cas9 ignores them. This self-vs-non-self discrimination is elegant and essential. For gene editors, the PAM creates two practical consequences. First, not every desired target position has an NGG nearby, limiting what you can target. Second, off-target sites must also have NGG nearby, which somewhat limits (but does not eliminate) off-target risk.

Section 5 — R-Loop Formation: Checking the Match

Once Cas9 finds a PAM and opens the DNA locally, the guide RNA spacer must check whether the adjacent 20-nucleotide sequence actually matches the intended target. This matching process is called R-loop formation (the R stands for RNA), and it is the central specificity-checking step of the entire mechanism.

The R-loop forms as the guide RNA spacer invades the DNA double helix and displaces the non-complementary DNA strand. The 20 nucleotides of the spacer base-pair with the complementary DNA strand one by one, starting from the PAM-proximal end (nucleotides 1–3) and zipping toward the PAM-distal end (nucleotides 18–20). The displaced non-complementary DNA strand forms the “loop” of the R-loop structure.

If the spacer sequence perfectly matches the DNA, the R-loop forms stably and extends through all 20 nucleotides. This full R-loop formation is the trigger that activates Cas9 for cutting. If there are mismatches — base pairs that cannot form — the R-loop is destabilised and may fail to extend fully, leaving Cas9 inactive at that position. The stringency of this check determines Cas9 specificity.

The Directionality of R-Loop Formation

🧬 R-Loop Formation: Step by Step

Step 1

PAM binding: Cas9 binds NGG in the major groove. This initiates local DNA melting immediately upstream of the PAM, exposing about 3 base pairs of single-stranded DNA.

Step 2

Seed region check (nt 1–12 from PAM): The guide RNA spacer begins base-pairing from the PAM-proximal end. These 12 nucleotides are the “seed region” — the most critical for specificity. Mismatches here strongly inhibit R-loop progression.

Step 3

R-loop extension (nt 13–20): If the seed region matches, the R-loop zips open toward the PAM-distal end. Mismatches in this region are better tolerated — the R-loop can sometimes extend over them.

Step 4

Full R-loop triggers conformational change: When the R-loop extends through all 20 nucleotides, the bridge helix of Cas9 contacts the RNA-DNA heteroduplex and signals the HNH domain to rotate into cutting position. The nuclease domains activate. Cutting begins.

Section 6 — The Cut: HNH and RuvC in Action

The actual DNA cleavage event is carried out by Cas9’s two nuclease domains working in concert. Each domain cuts one strand of the DNA double helix, together producing a double-strand break (DSB) — a complete severance of the DNA molecule at a single defined position.

The HNH domain cuts the complementary strand — the strand that has base-paired with the guide RNA. The catalytic residue is Histidine 840 (H840). HNH is a magnesium-dependent endonuclease: it requires a Mg²⁺ ion in its active site to activate a water molecule for nucleophilic attack on the DNA phosphodiester backbone. The cut leaves a 3’-hydroxyl and a 5’-phosphate group.

The RuvC domain cuts the non-complementary (displaced) strand — the strand that was pushed out during R-loop formation. RuvC is also Mg²⁺ dependent and uses a two-metal-ion mechanism similar to many other nucleases. Its catalytic residues include Aspartate 10 (D10) and Histidine 983 (H983). The cut is made at the same position as the HNH cut: 3 base pairs upstream of the PAM sequence.

✂ The Double-Strand Break: Exactly Where It Happens

5’–N N N N N N N N N N N N N N N N N N↓N N N G G–3’
3’–N N N N N N N N N N N N N N N N N N N↑N N C C–5’
                                    ↓RuvC cuts here   ↓HNH cuts here    PAM

Both cuts occur between position 17 and 18 of the protospacer (counting from position 1 nearest to the PAM). The result is a blunt-ended double-strand break — both cuts at the same position, no overhang. This is the canonical SpCas9 cut. Some other Cas9 orthologs generate staggered cuts with short overhangs.

🧬 Key Concept: Why a Blunt Cut Matters for Editing Outcomes

SpCas9 generates blunt-ended DSBs — cuts that leave no single-stranded overhang. The repair pathway after a blunt DSB is predominantly NHEJ, which produces small insertions or deletions (indels). Some Cas proteins (like Cas12a/Cpf1) generate staggered cuts with 5-nucleotide overhangs, favouring different repair outcomes. The choice of Cas protein partly determines what kind of genetic change results from the editing event.

Section 7 — The Seed Region: Why Location of Mismatches Matters

Not all positions in the 20-nucleotide spacer are equal. Mismatches between the guide RNA and target DNA are much better tolerated at some positions than others. The most important concept here is the seed region: the ~12 nucleotides immediately adjacent to the PAM (positions 1–12, counting from the PAM-proximal end).

Mismatches within the seed region almost always abolish or severely reduce Cas9 cutting. This is because R-loop formation initiates at the PAM-proximal end and zips toward the PAM-distal end: if early base pairs cannot form (seed region mismatches), the R-loop stalls and cannot complete, so Cas9 never reaches the active conformation. The seed region is therefore the primary specificity checkpoint.

Mismatches in the PAM-distal region (positions 13–20) are much better tolerated. The R-loop can sometimes extend over one or even two mismatches in this region, allowing Cas9 to cut even at imperfectly matched sites. This is the primary source of CRISPR off-target effects: sites with perfect seed region matching but one or two PAM-distal mismatches can still be cut at significant frequency.

⚠ Mismatch Tolerance Map Across the 20-Nucleotide Spacer

Red (nt 1–12): Seed region — mismatches here strongly inhibit cutting Green (nt 13–20): PAM-distal — single mismatches often tolerated

Practical implication: when designing guide RNAs, pay special attention to seed region matches. A perfect match in nt 1–12 with mismatches in nt 13–20 is more dangerous (higher off-target risk) than mismatches distributed evenly. Off-target prediction algorithms weight seed region matches heavily for exactly this reason.

Section 8 — Cas9 Variants: Different Bacteria, Different Rules

SpCas9 from Streptococcus pyogenes was the first and is still the most widely used Cas9. But it is not the only Cas9 in nature. Dozens of bacterial species have their own CRISPR-Cas9 systems, each with different protein sizes, PAM requirements, and cutting characteristics. Understanding the alternatives matters for two reasons: different applications call for different tools, and the limitations of SpCas9 (particularly its large size and NGG-only PAM) have driven the development of alternatives.

Cas9 Ortholog Comparison

Protein	Source Organism	Size	PAM	Key Advantage
SpCas9	S. pyogenes	1,368 aa	NGG	Most studied, highest activity; the default choice
SaCas9	S. aureus	1,053 aa	NNGRRT	25% smaller than SpCas9 — fits in single AAV vector with gRNA
CjCas9	C. jejuni	984 aa	NNNNRYAC	Smallest natural Cas9; useful for size-limited delivery
SpRY	SpCas9 engineered	1,368 aa	NRN / NYN	Near-PAMless — can target almost any sequence; enables base editing anywhere
Cas12a (Cpf1)	Acidaminococcus	1,307 aa	TTTV	T-rich PAM; staggered cut; processes its own crRNA array

Section 9 — Engineered Cas9: Nickases, dCas9, and What They Enable

Understanding Cas9 domain structure has enabled researchers to engineer variants with fundamentally different capabilities. By mutating the catalytic residues in one or both nuclease domains, you can create Cas9 proteins that nick (cut only one strand), bind without cutting, or perform entirely new chemistry. These variants have become the basis for the next generation of gene editing tools.

Cas9 Nickase (nCas9): Cutting Only One Strand

Mutating either catalytic residue (H840A to inactivate HNH, or D10A to inactivate RuvC) creates a Cas9 nickase — a protein that cuts only one strand of the DNA. A nick (single-strand break) is repaired with high fidelity by the cell, generally without introducing indels. By using two nickases with guide RNAs targeting opposite strands several base pairs apart, you can generate a double-strand break only at the intended site — dramatically reducing off-target effects, because two simultaneous nicks at off-target sites are statistically very unlikely.

Nickases are also the foundation of base editors and prime editors. Both use nCas9 (typically with D10A mutation, creating an HNH-active, RuvC-dead nickase) as their DNA-binding and nicking component. The nick is required for base editors to convert the deaminated base before the cell repairs it, and for prime editors to initiate reverse transcription of the edit template.

Dead Cas9 (dCas9): Binding Without Cutting

Mutating both catalytic residues (D10A and H840A) creates dCas9 (dead Cas9) — a protein that can find and bind any genomic target sequence with the same precision as wild-type Cas9, but cannot cut anything. dCas9 has become one of the most versatile tools in molecular biology because it functions as a programmable DNA-binding domain that can be fused to any effector protein you choose.

🚫 CRISPRi (interference)

dCas9 alone, or fused to a transcriptional repressor (KRAB domain), blocks RNA polymerase access to the gene promoter. Gene expression is silenced without any DNA change. Completely reversible.

📢 CRISPRa (activation)

dCas9 fused to a transcriptional activator (VP64, VPR, SAM complex) recruits the transcription machinery to a gene promoter. Gene expression is amplified, often dramatically. No DNA editing required.

🎨 CRISPR Epigenome Editing

dCas9 fused to a DNA methyltransferase or histone modifier writes epigenetic marks at specific genomic locations. Silences or activates genes through chromatin modification rather than sequence change.

🔬 CRISPR Imaging

dCas9 fused to a fluorescent protein (GFP, mCherry) labels specific genomic loci for live-cell imaging. Tracks chromosome dynamics, visualises enhancer-promoter contacts, maps nuclear organisation.

Section 10 — Putting It All Together: The Complete Cas9 Cycle

Let’s now walk through the complete CRISPR-Cas9 mechanism from start to finish as a unified sequence of events. Every step described above fits into this cycle:

🔄 The Complete Cas9 Mechanism: 9 Steps

Assembly: Cas9 protein is expressed in the cell and binds its guide RNA (sgRNA). This induces a conformational change: Cas9 opens like a clam shell, creating the central channel that will accommodate the DNA. The complex is now in the active search conformation.

Genome scanning: The Cas9-gRNA complex diffuses through the nucleus, sliding along DNA in a 3D random walk. At each position, it briefly interrogates the DNA for the NGG PAM sequence by reading the major groove via the PI domain. This initial PAM check takes microseconds per position.

PAM binding and local melting: When NGG is found, Cas9 slows down and induces local melting of about 3 bp of DNA immediately upstream of the PAM. This exposes single-stranded DNA for the guide RNA spacer to interrogate.

Seed region base-pairing: The guide RNA spacer begins base-pairing with the exposed DNA strand from the PAM-proximal end. The first 12 nucleotides (seed region) must match nearly perfectly. Mismatches here abort the process and Cas9 moves on.

Full R-loop formation: If seed region matches, base-pairing extends through all 20 nucleotides, forming the complete R-loop. The displaced non-complementary strand forms the loop structure. The RNA-DNA heteroduplex now threads through the central channel of Cas9.

Conformational change activates nucleases: Full R-loop formation is detected by the bridge helix of Cas9, which contacts the RNA-DNA heteroduplex. This triggers HNH to rotate approximately 180 degrees into its cutting conformation, positioning His840 adjacent to the scissile phosphate on the complementary strand. RuvC is simultaneously repositioned.

Dual strand cleavage: HNH cuts the complementary strand at position 17-18. RuvC cuts the non-complementary strand at the same position. Both cuts require Mg2+ ions. The result is a blunt-ended DSB three base pairs upstream of the PAM.

Cas9 release: After cutting, Cas9 remains bound to the cleaved DNA for some time (seconds to minutes) before dissociating. During this time, cellular DNA damage response proteins (MRN complex, ATM kinase) are recruited to the break site.

DNA repair determines the edit: The cell attempts to repair the DSB. NHEJ rejoins the ends rapidly but imprecisely (producing indels). If a donor template was supplied with homology arms, HDR uses it to incorporate the desired sequence. The edit is now permanent and will be propagated to all daughter cells.

References & Further Reading

Jinek et al. (2012) — A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science 337:816. — The original biochemical characterisation of Cas9 cutting.
Nishimasu et al. (2015) — Crystal Structure of Cas9 in Complex with Guide RNA and Its Target DNA. Cell 156:935. — First high-resolution crystal structure of the complete Cas9-gRNA-DNA ternary complex.
Jiang et al. (2015) — A Cas9-guide RNA complex preorganized for target DNA recognition. Science 348:1477. — Molecular basis of PAM recognition and R-loop initiation.
Sternberg et al. (2015) — Conformational control of DNA target cleavage by CRISPR-Cas9. Nature 527:110. — Single-molecule studies showing how full R-loop formation triggers HNH activation.
Qi et al. (2013) — Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression (CRISPRi). Cell 152:1173. — The original dCas9/CRISPRi paper.
Ran et al. (2013) — Genome engineering using the CRISPR-Cas9 system. Nature Protocols 8:2281. — The landmark practical protocol paper; highly cited reference for experimental Cas9 use.
Addgene CRISPR Guide — addgene.org/guides/crispr/ — Excellent practical resource for researchers, including Cas9 variant selection, sgRNA cloning, and troubleshooting.

📋 Key Takeaways — Cluster 3

Cas9 is inactive until it binds its guide RNA. The gRNA induces the conformational change that creates the DNA-binding channel and arms the nuclease domains. This gating prevents random cutting.
PAM recognition is the first specificity checkpoint. Cas9 reads NGG in the major groove before investing energy in guide RNA matching. Without a PAM, Cas9 moves on without interrogating the sequence.
R-loop formation is the second, more stringent checkpoint. The guide RNA spacer base-pairs with the target strand from PAM-proximal to PAM-distal. Full R-loop formation triggers HNH rotation and activates cutting.
HNH cuts the complementary strand; RuvC cuts the non-complementary strand. Both cuts occur 3 bp upstream of the PAM, producing a blunt-ended DSB. Both require Mg2+ ions and specific catalytic residues.
The seed region (nt 1–12 from PAM) is critical for specificity. Mismatches here strongly inhibit cutting. PAM-distal mismatches (nt 13–20) are better tolerated and are the main source of off-target activity.
Mutating catalytic residues creates powerful new tools. D10A: RuvC-dead nickase. H840A: HNH-dead nickase. D10A+H840A: dCas9 (binds but cannot cut, enabling CRISPRi, CRISPRa, epigenome editing, and imaging).
Different Cas9 orthologs have different PAMs and sizes. SaCas9 fits in AAV. SpRY targets almost any PAM. Cas12a generates staggered cuts. Matching the Cas protein to the application is part of experimental design.

← Previous

Cluster 2: DNA Basics for CRISPR

↑ Pillar Page

Cluster 4: Guide RNA Design