GT-Scan

Contents:


Overview

GT-Scan is a tool for identifying and ranking candidate target sites from a given genomic region. It does this in regards to their "uniqeness" within the reference genome of a specified organism.

Because different mismatches have different consequences (i.e. a PAM-altering mismatch is more likely to break binding than a PAM-distal mismatch), GT-Scan is designed to be flexible. It achieves this with three user-specifiable parameters: The rule, filter and max-mismatch-count.

The following section describes how to design a job around these parameteres.

Designing a query

Rule

First we will consider the rule. Using the following rule, GT-Scan finds candidate targets in our genomic region. With this rule, a candidate target is any 23nt sequence that ends in GG. For this stage, case doesn't matter and [X, x, N] are equivalent, but bear in mind that these properties will become important later.

xxxxxxxxxxXXXXXXXXXXNGG

A candidate target

The following is one of potentially many candidate targets found in our genomic region. We will use it for the following examples.

AAAAAAAAAAAAAAAAAAAAAGG

Potential off-targets (exact match)

Green positions (defined by an N in the rule) are no-affinity positions. A different nucleotide here does not constitute a mismatch. Therefore each of these potential off-targets are considered exact matches to the candidate target.

AAAAAAAAAAAAAAAAAAAAAGGAAAAAAAAAAAAAAAAAAAATGGAAAAAAAAAAAAAAAAAAAACGGAAAAAAAAAAAAAAAAAAAAGGG

Potential off-targets (1 mismatch)

Blue and orange are low- and high-affinity positions respectively. A different nucleotide in one of these positions does count as a mismatch. Remember, these are defined by CaSe.

AAAAAAAAAAAAAATAAAAAAGGAAAATAAAAAAAAAAAAAAATGGAAAAAAAAAAAAAAAAAAAACTGAAAAAAAAAAAAAAAAAAAAGGT

Potential off-targets (2 mismatch)

Same as above, except each potential off-target has two mismatches. Remember, different green nucleotides don't count as mismatches.

AAAAAAAAAAAAAAAAAAATAGTTAAAAAAAAATAAAAAAAAATGGAAAAAAAAATAAAAAAAAAACTGAAAAAAAAAAAAAAAATAAAGGT

Potential off-targets (3 mismatch)

AATAAATAATAAAAAAAAAAAGGAAAAAAAAAATATAAAAAATTGGATAATAAAAAAAAAAATAAACGGAAAAAAAAAAAATAAAATAAGGT

Filter

Now for this candidate target, we have 16 potential off-targets. With just the one candidate target, it is easy to weed out potential off-targets that we may not care about (such as those without an NGG PAM), however, this becomes less trivial with tens or hundreds of candidate targets.

Instead, we can set a "filter" to do this for us.

The filter consists of IUPAC DNA characters, and drops potential off-targets that do not match it. In this case, potential off-targets that don't end in NGG or NAG (the alternate PAM) are dropped.

NNNNNNNNNNNNNNNNNNNNNRG

Filtered potential off-targets

AAAAAAAAAAAAAAAAAAAAAGGAAAAAAAAAAAAAAAAAAAATGGAAAAAAAAAAAAAAAAAAAACGGAAAAAAAAAAAAAAAAAAAAGGGAAAAAAAAAAAAAATAAAAAAGGAAAATAAAAAAAAAAAAAAATGGAAAAAAAAAAAAAAAAAAAACTGAAAAAAAAAAAAAAAAAAAAGGTAAAAAAAAAAAAAAAAAAATAGTTAAAAAAAAATAAAAAAAAATGGAAAAAAAAATAAAAAAAAAACTGAAAAAAAAAAAAAAAATAAAGGTAATAAATAATAAAAAAAAAAAGGAAAAAAAAAATATAAAAAATTGGATAATAAAAAAAAAAATAAACGGAAAAAAAAAAAATAAAATAAGGT

High-affinity mismatches

Next, we can drop potential off-targets with x or more mismatches in the region nearest the PAM, if we consider this to break CRISPR/Cas binding.

A value of three will cause GT-Scan to drop potential off-targets with three or more mismatches in this region.

AAAAAAAAAAAAAAAAAAAAAGGAAAAAAAAAAAAAAAAAAAATGGAAAAAAAAAAAAAAAAAAAACGGAAAAAAAAAAAAAAAAAAAAGGGAAAAAAAAAAAAAATAAAAAAGGAAAATAAAAAAAAAAAAAAATGGTAAAAAAAAATAAAAAAAAATGGAATAAATAATAAAAAAAAAAAGGAAAAAAAAAATATAAAAAATTGGATAATAAAAAAAAAAATAAACGG

Sorting

Finally, candidate targets are ranked according to the number, and similarity, of potential off-targets.

Our lone candidate target has the following potential off-targets:

  • 4 with 0 mismatches (exact match)
  • 2 with 1 mismatch
  • 1 with 2 Mismatches
  • 2 with 3 mismatches
AAAAAAAAAAAAAAAAAAAAAGGAAAAAAAAAAAAAAAAAAAATGGAAAAAAAAAAAAAAAAAAAACGGAAAAAAAAAAAAAAAAAAAAGGGAAAAAAAAAAAAAATAAAAAAGGAAAATAAAAAAAAAAAAAAATGGTAAAAAAAAATAAAAAAAAATGGAATAAATAATAAAAAAAAAAAGGATAATAAAAAAAAAAATAAACGG

Considering that this candidate target has four exact matches elsewhere in the genome, we should only consider it for targeting as a last resort!

Submission form

The input consists of basic and advanced parameters. To see an example complete form, click the Load Example button at the bottom of the page.

Basic parameters

Parameters

First up, select a genome. GT-Scan will predict off-targets from the reference genome you select here.
Please contact us if you wish to see any reference genomes added.

The genomic sequence should be an unspliced region of DNA.

Enter a rule in the next field. With the default rule, GT-Scan will find and rank targets (by their potential off-target count) for the CRISPR/Cas9 system.

GT-Scan initially finds targets in the genomic sequence that match the rule. A match to the default rule is any 23nt sequence ending in 'GG'. XxN characters represent any nucleotide. These matches are considered candidate targets.

For each candidate target, GT-Scan identifies potential off-targets from the reference genome, using the rule. A perfect match to a candidate target, using the default rule, is any 23nt sequence that has the same nucleotides at non-N (Aa, Tt, Cc, Gg, Xx) positions to the candidate target. A match is not required at N positions.

GT-Scan handles rules for other CRISPR/Cas9 systems such as the S. aureus Cas9 (SaCas9). The PAM for the SaCas9 (NNGRRT) includes wildcards other than N. In this case, R can be used to find candidate targets with A or G in the specified position.

Allowed letters are: AaCcGgTtXxNR

Advanced parameters

The off-target filter allows you to specify a rule to prevent potential off-targets from the aforementioned rule from displaying in the results. For example, the default filter will remove any sequence that doesn't end in -NGG or -NAG,(The PAM in the CRISPR/Cas system). The characters are the IUPAC codes for DNA nucleotides:

AAdenineTThymineCCytosineGGuanine
MA|CRA|GWA|TSC|G
YC|TKG|TVA|C|GHA|C|T
DA|G|TBC|G|TNA|C|G|T

The high-specificity mismatch limit is a limit to the number of mismatches allowed in 'high-affinity' positions. These are represented in the rule as the upper-case characters ACGTX.

Output

The results page for each job consists of a summary card displaying job information, and two tables. The first table contains candidate targets and the second contains potential off-targets.
The following example output is available here.

Example job information

The top card lists information about the job you are currently viewing.

Rule is as you entered on the submission page, with high-affinity positions, ATCGX, in orange, low-affinity possitions, atcgx, in blue and no-affinity positions, N, in green.

Off-target filter displays the filter rule specified. Potential off-targets not matching this string of IUPAC characters are removed from the results.

Maximum allowed number of high-specificity mismatches displays the maximum number of mismatches allowed in high-specificity positions. The maximum number allowed in the entire sequence is still three, where mismatches in no-specificity positions don't count towards the number of mismatches.

Genome displays the reference genome you selected on the submission page.

Location displays the locus where Bowtie aligned the DNA sequence from the FASTA file. If it displays an incorrect result or no result at all, it's possible that the sequence you entered is not an unspliced genomic region. Or perhaps there's a bug in GT-Scan, in which case, feel free to contact us!

Candidate Region Length displays the number of nucleotides in the input DNA sequence.

Candidate target table

An example candidate target table

The first table contains a row for each target from the genomic sequence. The targets are ranked in ascending order by the number of potential off-targets they have, where the 0 mismatch (exact match) column is the primary sorting column, followed by the 1, 2 and 3 mismatch columns.

The first two columns display the position of each candidate target in the genomic sequence, and the strand relative to the genomic sequence, respectively. The third column displays the nucleotide sequence of the target, in the same colour scheme as the rule.

The four remaining columns display the number of potential off-targets, based on the number of mismatches to the candidate target. The targets are ranked by the values in these four columns. The highest-ranked targets, at the top of the table, have fewer similar (or identical) genomic sequences compared to the lowest-ranked targets.

Select a candidate target to display its potential off-targets.

Potential off-target table

An example potential off-target table

The second table displays the potential off-targets for the selected candidate target. The following example table, shows the potential off-targets for second (highlighted) target in the above table. The 5 entries are the 4 potential off-targets with three mismatches and the 1 potential off-target with two mismatches. Mismatches in no-specificity positions (green) don't count as a mismatch and therefore don't contribute to the number of mismatches.

The 'Potential Off-target Sequence' column displays the nucleotide sequence for each potential off-target, where mismatches between each potential off-target and its associated candidate target are bold (except for mismatches at no-specificity positions).

The 'Number of Mismatches' columns display the total number of mismatches and the number of mismatches at high-specificity and low-specificity (orange and blue, respectively) positions between potential off-targets and the selected candidate target.

The 'Chromosome', 'Position', and 'Strand' columns display the locus of the potential off-target in the reference genome.

The final column contains a link to a genome browser for each target. For targets with potential off-targets with minimal mismatches, such as the first one in the example potential off-target table, it may be useful to view whether or not the potential off-target is exonic, for example.

Note: If the number of potential off-targets for each number of mismatches exceeds 100, those potential off-targets will be excluded from the potential off-target table. However, the number of potential off-target columns in the first table will still represent the total number of potential off-targets.