Manual

Table of contents:


Overview

GT-Scan is a tool for predicting optimal target sites in a DNA sequence, in relation to the reference genome of a specified organism.

Submission form

The input consists of basic and advanced parameters. On the job submission page, you can view more information by hovering your mouse over the ? icons.

Basic parameters

Basic parameters

The candidate genomic region should be an unspliced region of DNA, optionally with a FASTA description line. You can enter a sequence in the text field, or you can click 'Browse' to select a text file from your local machine. Click 'Clear' to delete to contents of this text box. An example human DNA sequence that we used to generate the images in the Output section of this manual is available under the ? icon.

Select a reference genome in the next field. Please contact us if you wish to see the reference genome for an unavailable organism. If you are hosting your own instance, the items available in this list depend on the server's configuration file (See installation). GT-Scan will predict off-targets from the reference genome you select at this step.

Enter a rule in the next field.You can also select from a range of rules under the ? icon. With the default rule, GT-Scan will find and rank targets for the CRISPR/Cas9 system.
GT-Scan initially finds targets in the DNA sequence that match the rule. A match to the default rule is any 20nt sequence ending in 'GG'. XxN characters represent any nucleotide.
GT-Scan subsequently locates potential off-targets from the reference genome that match the rule, for each target. A match to the default rule is any string of nucleotides that has the same nucleotide at the X and x positions to the target. A match is not required at N positions.

GT-Scan handles rules for other CRISPR/Cas9 systems such as the S. aureus Cas9 (SaCas9). The PAM for the SaCas9 (NNGRRT) includes wildcards other than N. In this case, R can be used to find candidate targets with A or G in the specified position.

The allowed letters are: AaCcGgTtXxNR

Advanced parameters

Basic parameters

The off-target filter allows you to specify a rule to prevent potential off-targets from the aforementioned rule from displaying in the results. For example, the default filter will remove any sequence that doesn't end in -NGG or -NAG,(The PAM in the CRISPR/Cas system). The characters are the IUPAC codes for DNA nucleotides:

A Adenine T Thymine C Cytosine G Guanine
M A|C R A|G W A|T S C|G
Y C|T K G|T V A|C|G H A|C|T
D A|G|T B C|G|T N A|C|G|T

The high-specificity mismatch limit is a limit to the number of mismatches allowed in 'high-affinity' positions. These are represented in the rule as the upper-case characters ACGTX.

Job information

Basic parameters

These fields are simply for your own convenience, and as stated, are both optional. The description will be appended to the current time to form the job name, and if you enter an email address, you will be emailed a summary of the job submission with a link to the results page.

If you don't enter an email address, you should retain the jobID if you wish to return to the results page. You can enter this ID on the front page. Alternatively, any jobs you submit will be added to the 'Recent jobs' list for the duration of your browser session (usually until you close your browser).

Output

The results page for each job consists of a side-bar displaying job information and two tables. The second table displays the potential off-targets for each candidate target.
You can view the following example output here.

Results side-bar

The side-bar lists information about the job you are currently viewing.

Job Name is the time appended to the description you may have entered on the submission page.

Rule is as you entered on the submission page, with high-affinity positions, ATCGX, in orange, low-affinity possitions, atcgx, in blue and no-affinity positions, N, in green.

The length of the rule is in parentheses.

Off-target filter displays the filter rule specified. Potential off-targets not matching this string of IUPAC characters are removed from the results.

Maximum allowed number of high-specificity mismatches displays the maximum number of mismatches allowed in high-specificity positions. The maximum number allowed in the entire sequence is still three, where mismatches in no-specificity positions don't count towards the number of mismatches.

Genome displays the reference genome you selected on the submission page.

Candidate Region ID displays the description line from the input DNA sequence, if present.

Candidate Region displays the locus where Bowtie aligned the DNA sequence from the FASTA file. If it displays an incorrect result or no result at all, it's possible that the sequence you entered is not an unspliced genomic region. Or perhaps there's a bug in GT-Scan, in which case, feel free to contact us!

Candidate Region Length displays the number of nucleotides in the input DNA sequence.

Job execution time displays the time from commencement of execution, to the output files being generated, in the format 'hh:mm:ss'.

Candidate target table

Results target table

The first table, the candidate target table, contains a row for each target from the candidate region. The targets are ranked in ascending order by the number of potential off-targets they have, where the 'Exact Match' column is the primary sorting column, followed by the 1, 2 and 3 Mismatches columns.

You can hover the mouse over the column headings to view information about each column. Selecting a candidate target will load up potential off-target information, for the selected target, in the potential off-target table.

The first two columns display the position of each candidate target in the the candidate genomic region, and the strand of the candidate genomic region that the candidate target is on. The third column displays the nucleotide sequence of the target, in the same colour scheme as the rule.

The four remaining columns display the number of potential off-targets, based on the number of mismatches to the candidate target. The targets are ranked by the values in these four columns. The highest-ranked targets, at the top of the table, have fewer similar (or identical) genomic sequences compared to the lowest-ranked targets.


Potential off-target table

Results target table

The second table displays the potential off-targets for the selected candidate target in the aforementioned table. The table in this document, shows the potential off-targets for the target represented by the first row of the candidate target table from this document. The 25 entries are the 24 potential off-targets with three mismatches and the one potential off-target with two mismatches. Mismatches in no-specificity positions don't count as a mismatch and therefore don't contribute to the number of mismatches.

The 'Potential Off-target Sequence' column displays the nucleotide sequence for each potential off-target, where mismatches between each potential off-target and its associated candidate target are bold (except for mismatches at no-specificity positions).

The 'Number of Mismatches' columns display the total number of mismatches and the number of mismatches at high-specificity and low-specificity (orange and blue, respectively) positions between potential off-targets and the selected candidate target.

The 'Chromosome', 'Position', and 'Strand' columns display the locus of the potential off-target in the reference genome.

The final column contains a link to a genome browser for each target. For targets with potential off-targets with minimal mismatches, such as the first one in the example potential off-target table, it may be useful to view whether or not the potential off-target is exonic, for example.

Note: If the number of potential off-targets for each number of mismatches exceeds 100, those potential off-targets will be excluded from the potential off-target table. However, the number of potential off-target columns in the first table will still represent the total number of potential off-targets.