Scholarly article on topic 'Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing'

Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing Academic research paper on "Biological sciences"

CC BY-NC-SA
0
0
Share paper
Academic journal
Mol Ther
OECD Field of science
Keywords
{""}

Academic research paper on topic "Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing"

Accepted Article Preview: Published ahead of advance online publication

Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing_

Ciaran M. Lee, Thomas J. Cradick, Eli J Fine, Gang Bao

Cite this article as: Ciaran M. Lee, Thomas J. Cradick, Eli J Fine, Gang Bao, Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing, Molecular Therapy accepted article preview online 11 January 2016; doi:10.1038/mt.2016.1

This is a PDF file of an unedited peer-reviewed manuscript that has been accepted for publication NPG is providing this early version of the manuscript as a service to our customers. The manuscript will undergo copyediting, typesetting and a proof review before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers apply.

со®®©

This work is licensed under a Creative Commons Attribution-NonCommercial ShareAlike 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http: //creativecommons. org/licenses/by-nc-sa/4.0/

ommo ommo

Received 23 July 2015; accepted 17 December 2015; Accepted article preview online 11 January 2016

Nuclease Target Site Selection for Maximizing On-target Activity and Minimizing Off-target Effects in Genome Editing

1 2 3 1

Ciaran M. Lee , Thomas J. Cradick , Eli J Fine , Gang Bao

department of Bioengineering, Rice University, Houston, TX 77005 2CRISPR Therapeutics, Cambridge, MA 02142 3Coyne Scientific, Atlanta, GA 30339

* To whom correspondence should be addressed. Tel: +1 713 348 2764; Fax: +1 713 348 5877; Email: gang.bao@rice.edu

Short title: nuclease target site selection in genome editing

ABSTRACT

The rapid advancement in targeted genome editing using engineered nucleases such as ZFNs, TALENs and CRISPR/Cas9 systems has resulted in a suite of powerful methods that allows researchers to target any genomic locus of interest. A complementary set of design tools has been developed to aid researchers with nuclease design, target site selection, and experimental validation. Here we review the various tools available for target selection in designing engineered nucleases, and for quantifying nuclease activity and specificity, including web-based search tools and experimental methods. We also elucidate challenges in target selection, especially in predicting off-target effects, and discuss future directions in precision genome editing and its applications.

Keywords: engineered nuclease, target selection, specificity, activity, genome editing Keywords: engineered nuclease, target selection, specificity, activity, genome editing

INTRODUCTION

Targeted genome editing technology continues to create intense excitement with each new technological advance. 1-3 The development of tools to generate DNA breaks, activate4, repress or label genomic loci,5, 6 and remodel chromatin7 in a controlled, targeted manner will greatly aid the studies of a wide range of biological issues, including gene and genomic functions. The ability to specifically modify the genome also holds great promise for targeted gene therapies. Early work with meganucleases and zinc finger nucleases (ZFNs) showed that targeted site-specific DNA breaks could greatly increase

the rate of homology directed repair (HDR) at the specified locus. , More recent developments include

e recent < re 1).12-14

TAL Effector Nucleases (TALENs)10, 11 and CRISPR/Cas9 systems (Figure 1).12-14 ZFNs consist of zinc finger motifs, which bind to DNA triplets, and the Fokl nuclease domain which cleaves DNA upon dimerization.15, 16 TALENs are composed of TAL effectors fused to the Fokl nuclease domain and recognize DNA bases via conserved repeats that differ by two residues known as the repeat variable

diresidue (RVD), which confers specificity to individual bases. , Unlike ZFNs and TALENs that use protein domains to recognize target DNA sequences, the widely used CRISPR/Cas9 system adapted from Streptococcus pyogenes (Spy) uses both RNA and protein-based DNA recognition. These RNA guided nucleases (RGENs) use a short guide RNA strand (gRNA), which targets a 20-nucleotide sequence, and the CRISPR associated (Cas) endonuclease Cas9, which binds to the fixed protospacer

12, 13

adjacent motif (PAM) NGG. , Although there is a strict adherence to PAM recognition, due to the short length of the PAM the specificity of RGENs is largely controlled by gRNA-DNA interaction. With these engineered nucleases, we now have efficient molecular scissors that can cut genomic DNA in cells at pre-selected locations and introduce mutagenic errors via the non-homologous end joining (NHEJ) DNA repair pathway for targeted gene knockout or targeted deletion of large chromosomal segments. Alternatively, if an exogenous DNA donor template is introduced in concert with the nuclease, DNA cleavage (DNA double strand breaks or nicks) may trigger endogenous HDR with the supplied

DNA donor template, resulting in precise DNA modifications (Figure 1). These abilities have led to the emerging field of genome editing, a new field in engineering and life sciences focusing on precisely modifying genomes using engineered nucleases.

With the rapid advancement of genome editing research, a suite of nuclease design and validation tools has been developed, significantly facilitating nuclease target site selection and experimental validation in terms of on-target and off-target activities. For most of the biological and medical applications of genome editing, high efficiency and high specificity of engineered nucleases are among the most important functional requirements; both are closely related to target site selection. For each endogenous genomic locus, the efficiency of DNA cleavage, both on-target and off-target, depends not

rget and of Cas9 prote

only on the intrinsic nuclease activity (such as that of Fokl domains and Cas9 protein) but also on target site accessibility and the affinity of DNA binding domain(s) (such as Tal effector domains and gRNA) to the target sequence. The specificity of engineered nucleases is significantly affected by the affinity of

nuclease-DNA binding, such as zinc finger - DNA binding (ZFNs), Tal effector - DNA binding (TALENs) and gRNA - DNA hybridization (CRISPR), although the dimerization of Fokl domains

(ZFNs and TALENs) and the Cas9-PAM interactions may also play important roles. There is a lack of understanding on the behavior and functions of engineered nucleases in living cells, especially the dynamics of their interactions with DNA, and the cell cycle-dependent cleavage activity. Due to the limited biological knowledge and understanding of the structure and dynamics of the cell nucleus, especially chromatin structure, prediction of nuclease target accessibility and cleavage rates in living cells remains difficult. Further, the efficiency of homology directed repair also depends on the design, accessibility, and binding affinity of the donor templates as well. Therefore, experimental validation of target site selection is necessary. Herein, we use 'true off-target sites' to indicate the off-target sites that are experimentally confirmed using PCR, sequencing or other methods.

In this article we review some of the web-based tools available for target selection in designing engineered nucleases, and selected experimental methods for quantifying nuclease activity and specificity. Due to space limitations and the rapid development of the genome editing field, only a subset of available tools will be discussed, rather than having a comprehensive review. Challenges in target selection, especially in predicting off-target effects, and future directions in precision genome editing will also be discussed.

WEB-BASED DESIGN TOOLS FOR NUCLEASE TARGET SELECT

A range of bioinformatics and experimental based nuclease design tools have been developed that aid the target site selection of engineered nucleases. These tools fall into three main categories: (1) choice of target sites / design of nucleases, (2) genomic searches for possible off-target sites and (3) determining the level of on- and off-target cleavage rates. A list of the available design tools is given in Table 1, together with a brief description of the functionality for each tool. Most of the tools listed in Table 1 are for the design of CRISPR/Cas9 systems, with a few for ZFNs and TALENs.

ZFN Design Tools

teins (ZFPs

Zinc Finger Proteins (ZFPs) can be designed to target many novel sequences based on the 3 bp specificity of individual fingers.19, 20 Phage display based selections and rational design techniques have been used by certain companies and research labs to generate high-affinity ZFPs and ZFNs.21-26 However, zinc finger (ZF) design remains difficult due to positional effects and a lack of straightforward ZFP design principles - a number of amino acid sequences in a given finger can specify a given triplet, but the activity of any given zinc finger is strongly dependent on its position in the ZFP and the nature of

the neighboring zinc fingers. - Tools such as ZiFit were developed to address this issue by taking the context dependence into account. However, designing a highly active and specific ZFN pair remains

challenging. , Alternatively, a bacteria two-hybrid screening platform is also available for custom

ZFP production. However, the substantial amount of work required has limited its use outside of a small number of dedicated labs. TALEN Design Tools

For designing TALENs, the DNA targeting specificity of TAL effector RVDs is more straightforward than that of ZFs, allowing easier design of TALENs. There are four main RVDs, one for

each DNA base. , Based on this simple 1 to 1 recognition code and the requirement for a flanking 5'

10 11 33 36

thymine base, first generation design programs output many potential target sites. , , - Despite the

ability of well-designed nucleases to target defined loci with high efficiency, the widespread use of TALENs has been hampered by poor performance of some TALEN pai rs designed, thereby necessitating the screening of a large number of candidates to find a validated TALEN pair with a high level of activity. For example, a high-throughput study that looked at the activity of 96 TALEN pairs determined that 12 pairs had no activity and 43 pairs had activities below 20% in a model cell line.37 Some TALEN design tools incorporate ranking of TALEN pairs. The E-TALEN webtool incorporates a scoring algorithm for ranking potential TALEN s, but this scoring system was not experimentally

validated. The second generation TALEN design tool SAPTA (Scoring Algorithm for Predicting TALEN Activity) uses improved guidelines for TALEN design based on rules derived from

experimentally testing 205 individual TALEN monomers.39 The SAPTA algorithm was designed to

identify target sites for highly active TALENs34 that use the NK (Asparagine-Lysine) RVD which displays higher specificity for guanines compared to the standard NN (Asparagine-Asparagine) RVD.40 It was clear in constructing SAPTA that affinity plays an important role in having high cleavage activity,

especially when the G-C content of the target site is high. However, the current version of SAPTA is based on experimental results from TALENs with NK RVD, and issues with target accessibility may render the predicted activity inaccurate. Therefore, further improvements of SAPTA are being conducted to make it a more useful design tool.

RGEN Design Tools

The ability of the Spy CRISPR/Cas9 system to target any 20 nucleotide sequence that is adjacent to an NGG PAM simplifies the design of gRNAs, since it is easy to locate PAM sequences in a gene or region of interest using a bioinformatics tool (Table 1). Although in general the CRISPR/Cas9 systems may have a much higher DNA cleavage rate when compared to ZFNs and TALENs, it is still desirable to identify optimal target sites in silico. Efforts have been made recently to develop web-based tools to predict high nuclease activity sites in a genomic region of interest. For example, sgRNA Designer41 (Table 1) attempts to predict the optimal sequence composition for high CRISPR/Cas9 activity. However, although the algorithm was validated with a previous CRISPR knock out library screen in human and mouse cells, it was not tested for designing a gRNA for a given input sequence. Similarly,

sgRNA Scorer42 (Table 1) ranks gRNAs for high activity based on an algorithm generated using data from gRNAs tested in HEK293T cells. This study noted some correlation between site accessibility and gRNA activity, but it is unknown whether the predicted scores are valid for other cell types. The ranking from sgRNA Designer and sgRNA Scorer were shown to have a weak correlation. The web-based tool CRISPR Scan43 (Table 1)

more accurately predicts gRNA activity in zebrafish than sgRNA Designer. Although this is likely due to a more accurate algorithm in CRISPR Scan, it could also be a consequence of using data from zebrafish in constructing the algorithm, resulting in a somewhat biased comparison. However, unlike sgRNA Designer and sgRNA Scorer (both with algorithms based on library screens), to date CRISPR Scan is the only tool with demonstrated ability to correlate gRNA activities with predicted scores. Although CRISPR Scan could also identify highly active gRNAs in Xenopus tropicalis, it remains to be seen if it can predict sgRNA activity in human cells.

WEB-BASED TOOLS FOR NUCLEASE OFF-TARGET SITE PREDICTION

The advancement of ZFN and TALEN technology sparked a growing concern for potential offtarget cleavage that may occur throughout the genome. Nuclease specificity was often measured indirectly by cellular toxicity levels.44, 45 More sophisticated techniques aim to directly measure nuclease activity at predefined genomic loci or screen libraries of sequences to identify potential off-target sites.46,

Large genome size and the large number of potential nuclease cleavage sites have made determining the most likely off-target sites very difficult, especially as genomic context can greatly influence the

cleavage of identical sites at different loci. A number of tools have been developed that search genomes for possible off-target sites for engineered nucleases, including scripts that systematically scan

genomes and web-based bioinformatics tools that aid in the determination of potential off-target sites. Some of these tools are well validated using other existing approaches and/or experimental methods, including next-generation sequencing (NGS) of targeted amplicons. One example of using true offtarget sites of a well characterized ZFN pair for establishing a bioinformatics tool is PROGNOS (Predicted Report Of Genome-wide Nuclease Off-target Sites) (Figure 2a), which was validated using results from different methods and comparisons of the level of overlap and the number of sites identified by each method are shown in Figure 2a46, 47, 49-51 Interestingly, PROGNOS, an exhaustive search tool,

d compa d compa

identified a true off-target site that was not found with experimental based methods. 46 However, highly active off-target sites may not be ranked highly by PROGNOS, suggesting that there are unknown factors influencing ZFN and TALEN off-target activity but not yet accounted for in PROGNOS. Therefore, further improvements of PROGNOS are needed based on unbiased genome-wide analysis of off-target activity of ZFNs and TALENs.

Compared with ZFNs and TALENs, the CRISPR/Cas9 systems are easier to use, more efficient, and can readily target multiple genes. The potential drawback of using CRISPR/Cas9 systems to target genomic loci is possible off-target effects, since their target specificity relies on Watson-Crick base

pairing, thus a gRNA can hybridize to sequences containing base mismatches, resulting in off-target cleavage.52-54 Although many web-based tools have been developed to identify off-target sites (Table 1), none can predict off-target sites with high accuracy, as discussed below. For example, a recent comparison of off-target predictions by the MIT CRISPR Design and E-CRISP tools for 9 different gRNA designs demonstrated that in predicting CRISPR/Cas9 off-target sites these tools performed poorly, indicating that off-target activity cannot be accurately identified when predictions are solely based on sequence homology (Figure 2b, 2c).55 Further, it was revealed that CRISPR/Cas9 systems could tolerate DNA bulges and RNA bulges at the cleavage site, in addition to base mismatches56. Consequently, a more sophisticated program, COSMID (CRISPR Off-target Sites with Mismatches,

Insertions and Deletions) was developed that ranks potential off-target sites by considering base

mismatches, insertions and deletions between gRNA and DNA sequences,48 and some other search tools

have since incorporated insertions and deletions as an additional search option.

A comparison of existing web-based tools for predicting CRISPR/Cas9 off-target sites revealed a wide range of agreements and discrepancies (Table 2). The inability of some tools to identify off-target sites containing only mismatches suggests that these tools use a repeat masker (Table 2). With DNA or RNA bulges, tools with the ability to search for bulge-containing sites perform better than those without, although some tools can identify bulge-containing sites that can be modeled as base mismatches (Table 2). However, they failed to identify true off-target sites with bulges that cannot be modeled by base mismatches alone (Table 2). Since there is still a lack of understanding about target site accessibility and RGEN binding to DNA in living cells, the existing CRISPR design tools may not predict off-target effects (sites and cleavage rates) with high accuracy, therefore readers are advised to consider using several tools (Table 1) to compare outputs for initial design of gRNAs and perform experimental validation to determine true off-target sites.

The CROP-IT web tool integrates whole genome information from existing Cas9 off-target

binding and cutting data sets in an effort to improve off-target identification and prediction . Even though this tool makes use of experimental data and outperforms some other search algorithms, it still performed poorly when compared to the results obtained using the Guide-Seq method, since only ~60% of the true off-target sites were identified for 3 gRNAs even when the top 500 predicted sites were considered. This high level of false positive hits demonstrates a major drawback of current in silico algorithms for RGEN off-target identification.

The tools for ZFN, TALEN and RGEN off-target predictions differ in their input parameters, search features, degree of exhaustive search, accuracy and the amount of information in output. In some

cases, a number of sequence-validated off-target sites could be identified only by a single tool; 46 in

some other cases, predictions from several tools overlap . As shown in Table 3, although not perfect, in silico off-target search tools can be very helpful in quickly establishing a nuclease design, synthesis and

testing workflow. For example, the current web-based tools are useful in screening potential gRNA designs for identifying closely matched sites, and tools that do not contain a repeat masker can help identify gRNAs that have perfectly matched off-target sites or that target repetitive elements.

Unlike PROGNOS which has algorithms built upon molecular information of protein-DNA interactions for both zinc finger motifs and TAL effector RVDs, existing web-based tools for the

31, 35, 48, 52

prediction of RGEN off-target sites , , , rely heavily on sequence homology between the gRNA and potential cleavage sites. This often renders the prediction and ranking of potential off-target sites inaccurate. There is an unmet need to establish broadly applicable "in silico" rules for searching and ranking RGEN off-target sites due to the fundamental challenges, including the lack of detailed molecular information on Cas9, gRNA and DNA interactions, the quantitative measurements of affinity between gRNA and DNA target, and target accessibility. To improve the first-generation search algorithms, a better understanding of gRNA-DNA interaction, nuclease-DNA binding and cleavage dynamics, as well as target accessibility is required. With newer genome-wide methods for determining

nuclease off-target cleavage,55, 59-61 it is likely that more true off-target sites for engineered nucleases (especially CRISPR/Cas9 systems) will be confirmed and a better understanding of nuclease off-target effects emerge, which will help improve the bioinformatics based off-target search and prediction tools.

METHODS FOR EXPERIMENTAL EVALUATION OF TARGET SITE SELECTION

Many experimental methods have been developed to quantify the activity of engineered nucleases, including enzyme-based assays62, 63 and sequencing based assays64-66. Most of these methods detect small insertions and/or deletions (indels) that arise from imperfect NHEJ-mediated repair of DNA double strand breaks (DSBs). The most widely used enzyme-based methods rely on mismatch-sensitive enzymes such as CEL-I nuclease and T7 endonuclease I (T7EI).62, 63, 67 They work by detecting heteroduplexes formed by hybridizing wild-type and mutant DNA sequences or hybridizing two different mutant sequences together, and the relative intensity of cleavage products resolved by agarose gel electrophoresis provides a measure of mutation frequency in a population of cells. Alternatively, if the nuclease cut site is within a unique restriction enzyme motif, a restriction fragment length polymorphism (RFLP) based assay can be used in place of CEL-I or T7EI. In this assay, nuclease induced indels destroy the restriction site. When these cleavage products are resolved on a gel, the band corresponding to the uncut DNA represents the mutant population.68 Although these enzyme-based assays are quick and cost effective, they have a detection limit of 1-5% and are sensitive to endogenous mismatches (such as heterozygous SNPs) leading to potential false positive results.

Sanger sequencing of DNA from individual clones has been the gold standard for confirming nuclease induced indels, but this method is time consuming and not cost-effective due to the high number of samples that need to be analyzed.69 Alternatively, Sanger sequencing of a bulk population can be used in conjunction with the recently developed web tool TIDE (Tracking of Indels by Decomposition).64 TIDE deconvolutes the mixed chromatogram signals from nuclease-treated cells to accurately determine the mutation frequency in the population. The TIDE tool also outputs the

frequency of each deletion and insertion size in the population and is insensitive to endogenous SNPs. However, as with the enzyme-based methods, TIDE analysis has a lower limit of detection of 1-5%. To accurately detect rare cleavage events, high-throughput sequencing approaches enable accurate measurement of mutation rates as low as 0.1%, although careful consideration should be made to discard

false positives due to PCR or sequencing error.

Single Molecule Real-time (SMRT) sequencing is an alternative platform that has been demonstrated to perform as well as sanger sequencing of single cell clones but with higher throughput, and it is possible to use SMRT sequencing to measure HDR and NHEJ events simultaneously due to the longer read length.65 Other less common protocols available for detecting nuclease induced indel rates

ng nucleas >9 RFLP73

include fluorescent PCR,71 DNA melting analysis,72 and CRISPR/Cas9 RFLP73 that can distinguish between mono and biallelic mutagenesis in clones. These methods indirectly measure nuclease activity as they depend on the mutagenic susceptibility of the endogenous repair machinery in the cell type employed. One method that directly measures the levels of DNA DSBs is BLESS (direct in situ breaks ladling, enrichmern on ^ and £ ge^on se„,74 „ m- de,eo,s free DNA DSB ends, it cannot detect any alleles that have undergone NHEJ repair. As the price of next-generation sequencing

(NGS) has dropped markedly, it is now possible to very precisely measure the percentage of alleles that are wild-type, mis-repaired or have correctly undergone HDR.65 Many labs use internal pipelines for the analysis of sequencing results; though several web-based tools have been recently developed, including CRISPR-GA75 and CRISPResso (Table 1).

METHODS FOR DETERMINING OFF-TARGET EFFECTS

Although engineered nucleases are designed to cleave at a predefined genomic locus, off-target effects at closely matched sequences have been observed.45, 53, 76 ZFNs and TALENs display promiscuity due to the ability of ZFPs and TAL effectors to bind to sites in the genome that have high degrees of homology to on-target sites. RGEN induced DSBs can be caused by binding promiscuity of both the

gRNA and the Cas9 endonuclease. The optimal PAM for Spy Cas9 is NGG, although active off-target sites with NAG, NGA, NCG, NGC, NGT, NTG, and NAA PAM sequences have been identified.52, 55 Mismatches as well as base insertions or deletions that form bulges between the gRNA and the target DNA strand may also be tolerated.52, 53, 56 The functional consequence of the off-target activity of engineered nucleases is still largely unclear and the off-target effects (both sites and cleavage rates) are likely to vary within the major classes of nucleases due to the requirement for homology with the on-target site, and between the major classes of nucleases due to the nature of nuclease-DNA binding. However, any active off-target site in an exonic or regulatory sequence in a genome would likely have

detrimental effects on gene expression and could possibly lead to aberrant cellular function. In addition

53, 77 78

to nuclease-induced small indels, there is the possibility of a chromosomal deletion,53, 77 inversion,78 or

translocation between the on-target and off-target sites (Figure 3). Indeed, the potential for

chromosomal translocations is a real concern in the use of multiplex gene targeting for therapeutic

purposes, although it presents a novel system for modeling oncogenic translocations in vivo.

Given the potentially dire consequences of nuclease off-target activity, it is pertinent to identify and characterize potential off-target effects when using genome editing for therapeutic applications. Experimental determination of active off-target sites is a laborious task due to the size of the genome and the large number of potential off-target sites. Early studies of nuclease specificity focused on

49 81 82

experimental methods, such as in vitro SELEX , , , IDLV (integrase-defective lentiviral vector)

50 47 83

capture50, in vitro cleavage47, and bacteria one-hybrid screening83 to determine potential off-target sites

and provide a shortlist of candidate sites for testing. All of these methods are laborious, costly and

require highly specialized protocols which have prevented their widespread use. It is therefore very

beneficial to use bioinformatics-based tools to identify potential nuclease off-target sites, as discussed

above. The fact that PROGNOS46 has identified bona fide off-target sites for more ZFNs and TALENs

constructed than available experimental based methods such as SELEX and IDLV capture is a clear

demonstration of the power of in silico prediction methods.46

As for the CRISPR/Cas9 systems, issues with target sequence accessibility and the tolerance of base mismatches and DNA/RNA bulges make accurate prediction of true off-target sites difficult. For example, existing web-based tools for RGEN off-target prediction may identify hundreds or even

thousands of potential off-target sites, 52 but the scoring/ranking of these sites is usually inaccurate or even misleading, since typically few of the top-ranked sites are true off-target sites as revealed by experimental evaluation. The most widely used algorithm for scoring potential off-target sites predominantly relies on data from four gRNAs targeting a single gene and determines the likelihood of cleavage at a given site based on the total number of mismatches (up to four), mismatch position and distance between mismatches52. However, given the high number of false positive hits and the failure of many tools to identify true off-target sites, it is likely that there are other factors apart from sequence homology that influence off-target cleavage. Neither experimentally testing all the potential off-target sites nor relying on rudimentary ranking of these sites is ideal for confirming the true off-target sites.

Recently, several new experimental methods have been described that attempt to capture the genome-wide activity of RGENs in an "unbiased" manner (Figure 4). These methods use different strategies to detect DNA DSBs with the ultimate goal of identifying RGEN induced DSBs. Cas9 ChIP assays use a catalytically dead version of Cas9 (dCas9) to determine the genome-wide binding profile of dCas9 when combined with a specific gRNA. For all gRNAs tested, ChIP-seq identified the on-target

61, 84, 85 84

site and hundreds of genome-wide Cas9 binding sites. , , However, Wu et al. reported that only 1

out of 295 ChIP-seq identified sites had off-target activity as confirmed by deep sequencing whereas Kuscu et al. reported Cas9 cleavage activity at 7 ChIP-seq predicted sites for a single gRNA. 61 Independent reanalysis of these 7 sites found no evidence for RGEN activity and suggested that the indels observed were due to Illumina sequencing errors in processing homopolymer stretches close to the expected cut sites. 55 The lack of overlap between dCas9 binding and Cas9 cleavage activities from these ChIP-seq studies demonstrates that Cas9 binding does not necessarily serve as a marker for RGEN activity. In the absence of gRNA molecules, dCas9 favored DNA regions with open chromatin, raising

the possibility that RGEN activity or site preference could be influenced by site accessibility. An alternative approach, Digenome-seq has been developed in which potential off-target sites are identified via in vitro digestion of intact genomic DNA-RGEN complexes coupled with whole genome sequencing.59 This method identifies RGEN off-target sites based on the ability of the nuclease to recognize and cleave genomic off-target sites in vitro 59 When gRNAs targeting HBB and VEGFA were tested using this method, only 4 out of 37 and 8 out of 34 off-target loci identified respectively for the two genes were found to have detectable levels of activity when interrogated by deep sequencing. It is possible that the rest of the sites were false positives or had activity levels below the limit of detection. This discrepancy suggests that cellular or genomic context plays an important role in off-target cleavage.

Genome-wide RGEN off-target sites can be determined by break capture methods, including IDLV capture,60 translocation capture HTGTS (high throughput, genome-wide translocation sequencing),86 and dsODN capture.55 These methods use different strategies making it difficult to directly compare them. However, there are some striking differences in the results. In a study using IDLV capture,60 6 true offtarget sites were not found. Each of these sites had activity <1% when assayed by deep sequencing,

suggesting that this may be the detection limit of IDLV capture. HTGTS identifies off-target DSBs that have translocated to the on-target site.86 In using HTGTS for identifying the off-target activity of different gRNAs, it was demonstrated that some gRNAs are more specific than others; however, the translocation loci were not analyzed by deep sequencing to determine the activity at identified off-target sites. 86 This method is limited by the requirement for DSBs at the on- and off-target sites to occur within the same cell simultaneously. Both breaks must also escape local NHEJ repair which may affect the sensitivity of the assay. The GUIDE-seq method uses a short double stranded oligonucleotide (dsODN) instead of a lentiviral construct to tag DSBs.55 This study found a large number of previously unknown off-target sites for 3 gRNAs and identified off-target sites for 10 additional gRNAs. The GUIDE-seq method is a powerful tool to identify true genome-wide RGEN off-target sites without the restrictions of in silico prediction algorithms. This method makes the assumption that all the sites with

RGEN-induced DSBs should take up the blunt ended dsODNs by an NHEJ-dependent pathway. Although this scenario is possible, repair by NHEJ without dsODN insertion is more likely, and sequence homology may influence the integration of dsODNs into certain DSBs. It would be interesting to see if genome-wide GUIDE-seq profiles are consistent using dsODNs of varying sequence. Further, the ability to integrate dsODNs into DSBs by NHEJ may be dependent on the cell type and the nature of the DSB, for example 5' overhangs induced by FokI cleavage and 5' or 3' overhangs induced by Cas9 nickase pairs. The initial study used two cell lines and it remains to be seen if this method can be successfully applied to other cell lines and adapted for use in clinically relevant cell types such as

hematopoietic stem cells (HSCs). The only method to directly detect DNA DSBs is BLESS.

♦ ' ^---------

have un

ENREF 78 The drawback of directly detecting DSBs is that alleles that have undergone NHEJ repair cannot be detected, which makes the assay time sensitive. However, this time sensitivity could allow genome-wide mapping of the chronological order of the activity of engineered nucleases at on- and offtarget sites. The BLESS method also outperformed both ChIP-seq and in silico prediction when directly

compared with results using two gRNAs with two different Cas9 orthologs (4 scenarios).87

A comprehensive comparison of the methods for genome-wide RGEN off-target detection is difficult since there is little overlap in the gRNAs used in these studies. However, the small amount of data that permits direct comparisons shows that GUIDE-seq identifies more off-target sites than any other method, although differences in cell types used in different studies should be taken into account. The establishment of a unified database of all true off-target sites of RGENs would facilitate the design of improved algorithms for in silico prediction of potential off-target sites, which would provide a quick, cost effective means to pre-screen candidate gRNAs and greatly enhance the analysis of RGEN genotoxicity.

METHODS FOR MINIMIZING OFF-TARGET EFFECTS

Several approaches have been developed to reduce off-target activity of engineered nucleases (Figure 5). Early attempts to block off-target activity of ZFNs used mutagenesis of the FokI domain to create heterodimeric versions to reduce homodimerization of ZFNs.44, 88-90 These modifications are also applicable to other engineered nucleases, such as TALENs and RGENs. However, these heterodimeric modifications can also reduce on-target activity of nucleases, presumably by reducing the binding energy of FokI dimerization. FokI mutagenesis has also been used to generate FokI nickases.91-93 For example, ZFNickases can induce HDR at a lower rate than ZFNs, but have a higher HDR to NHEJ ratio. FokI nickases have also been successfully used with TAL effectors.94, 95 The Cas9 endonuclease generates DSBs by cleaving DNA strands via conserved RuvC and HNH nuclease domains. The RuvC domain cleaves the non-target DNA strand and the HNH domain cleaves the target DNA strand. Inactivation of one domain results in a partially inactivated Cas9 that can generate DNA single strand breaks.96, 97 It has been demonstrated that the Cas9 nickases may have reduced off-target activity while havi„g on-target ^.y. B ca„ also ^ed generate a S,aggered DSB a the o^ge, ^s.86,

98 However, if two adjacent 20-base off-target sites with appropriate spacing have sufficient sequence homology to the intended on-target sequence, the Cas9 nickases can bind and become active, resulting in off-target cleavage.56 This suggests that the Cas9 nickase system reduces off-target activity largely by increasing the overall target length from 20 to 40 bases. Further, Cas9 nickases may not be fully inactivated and can still induce DSBs even with a

single gRNA99. The specificity of CRISPR/Cas9 system could be further increased if both of the Cas9 nuclease domains in a Cas9 nickase pair are mutated to create catalytically inactive or dead Cas9s (dCas9s) which are then fused to the FokI nuclease domains respectively, forming a dCas9-FokI pair. In this case, the targeting of DNA sequence is achieved by two gRNAs and dCas9s, and the DNA cleavage is generated by the dimerized FokI domains. Although off-target activity is reduced to a greater degree compared to Cas9 nickases, lower

on-target activity is also observed.99-101 dCas9-FokI pairs also have a more strict spacer length due to the requirement for FokI dimerization, which limits the number of potential targets in a genome.

RGEN off-target effects can also be mitigated by modifying the gRNA, although there is

conflicting evidence as to how best to achieve reduced mutagenic potential. Both gRNA truncation and gRNA elongation59 have been shown to reduce the off-target activity of certain gRNAs and result in better on- to off-target ratios. More widespread use of these strategies could reveal if they are broadly applicable to all gRNAs, or to which gRNAs they are best suited. Cas9 orthologs with different PAM

87, 103-105

requirements have been adopted recently for genome editing in mammalian cells. , " Two Cas9 orthologs with longer PAM sequences, Staphylococcus aureus Cas987 and Neisse ria meningitidis Cas9

and Neisser

(Lee et al. submitted) have reduced off-target activity. Orthologs with longer PAM sequences are expected to have fewer potential off-target sites genome-wide although the probability of finding a PAM sequence in a gene of interest is also reduced. These orthogonal systems could also be altered to form

nickases and dCas9-FokI fusions to further increase the specificity of RGENs.

CHALLENGES AND PATH FOR

Over the last few years a new field of precision genome editing has emerged, thanks to the recent advent of engineered nucleases, especially TALENs and CRISPR/Cas9 systems. Although precision genome editing has the potential to revolutionize biology and medicine, and holds great promise for many applications, including disease modeling, molecular pathway dissection, synthetic biology and therapeutics, many challenges remain. For example, engineered nucleases often generate off-target cleavage, causing mutations, insertions, deletions, inversions, or translocations in the genetic sequence, which may result in aberrant gene expression, cell death, or oncogenesis. Therefore, it is often necessary to maximize the cleavage efficiency of engineered nucleases and minimize genomic risk by reducing or eliminating off-target effects; both are closely related to target site selection. Further, in repairing nuclease-induced DSBs, cells typically favor error-prone pathways such as NHEJ and micro-homology

mediated end joining. For therapeutic applications of genome editing where HDR is required, significantly increasing the HDR rate in both dividing and non-dividing cells is a major challenge. Another important challenge in further advancing genome editing is efficient delivery of engineered nucleases, activators, repressors and donor molecules into clinically relevant cell types in vitro and in vivo, and developing methods for in vivo tissue-specific delivery.

Although many design tools have been developed for engineered nucleases (Table 1), better tools for target selection are still needed. Since each target locus in a genome requires that a pair of TALENs needs to be constructed and tested, it becomes quite laborious to screen for highly active TALEN pairs.

Further, despite the ease in designing and testing CRISPR/Cas9 systems, there is a large variability in

ne if ratio

their cleavage activity. Although attempts have been made to determine if rational design of highly

active gRNAs is possible, 41-43 when the output of these tools is compared, there is only a modest or no

correlation between them, indicating that the broad applicability of the scoring algorithms depends on

the experimental results employed in constructing the scoring functions. It remains to be seen if these

tools are fully predictive or if over training of the data or selection bias may have skewed the

parameters.

Off-target activity of engineered nucleases remains a major concern, especially in therapeutic get DSBs

rearrangements resulting from on- and off-target DSBs may lead to a cancerous phenotype in nuclease-

applications. Off-targ et DSBs may induce indels that activate oncogenes, and chromosomal

treated cells. Although great advances have been made in recent years in developing methods for identifying off-target sites, none of the in silico off-target search tools can accurately predict all possible off-target sites, and a better understanding of nuclease-DNA interaction dynamics and target accessibility is required in order to significantly improve these in silico off-target search tools. Also, despite the ability of NGS platforms to identify off-target sites with activity as low as 0.1%, there may be other off-target sites below this limit that go undetected. Another major concern is the variability in sequencing data analysis pipelines implemented by different labs when analyzing NGS data, which

makes comparisons between data sets very difficult. It is certainly desirable to have a small number (e.g., 1-3) of 'standardized' pipelines that are available to, and acceptable by, the general labs in genome editing. Further, the long-term effects of off-target activity are largely unknown. It is estimated that on average, each cell has an estimated steady state of 50,000 endogenous DNA lesions,106 while whole genome sequencing of 12 individuals revealed over 500,000 indels in each individual with 230-390

occurring in exonic regions . Other studies estimate the mutation rate of radiotherapy is at around 20108

40 DSBs/cell/Gy and up to 1000 single strand breaks/cell/Gy. Although it is likely that the number of DSBs induced by engineered nucleases is relatively small compared with the endogenous levels of DSB formation and the accumulation of exonic indels in the cell population, significant efforts need to be

♦ C)

made to analyze genome-wide off-target effects, develop a database for off-target activities in different cell types, establish consensus guidelines for selecting optimal target sites, and define benchmark assays,

best practices and unified standards for determining genotoxicity due to engineered nucleases. best practices and unified standards for determining genotoxicity due to engineered nucleases.

REFERENCES

1. Segal, D.J. & Meckler, J.F. Genome engineering at the dawn of the golden age. Annual review of genomics and human genetics 14, 135-158 (2013).

2. Cai, M. & Yang, Y. Targeted genome editing tools for disease modeling and gene therapy. Current gene therapy 14, 2-9 (2014).

3. Cox, D.B., Platt, R.J. & Zhang, F. Therapeutic genome editing: prospects and challenges. Nature medicine 21, 121-131 (2015).

4. Wilber, A. et al. A zinc-finger transcriptional activator designed to interact with the gamma-globin gene promoters enhances fetal hemoglobin production in primary human adult erythroblasts. Blood 115, 3033-3041 (2010).

5. Cong, L., Zhou, R., Kuo, Y.C., Cunniff, M. & Zhang, F. Comprehensive interrogation of natural

TALE DNA-binding modules and transcriptional repressor domains. Nature communications 3, 968 (2012).

6. Ma, H. et al. Multicolor CRISPR labeling of chromosomal loci in human cells. Proceedings of the National Academy of Sciences of the United States of America 112, 3002-3007 (2015).

7. Hilton, I.B. et al. Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes

from promoters and enhancers. Nature biotechnology 33, 510-517 (2015).

8. Rouet, P., Smih, F. & Jasin, M. Expression of a site-specific endonuclease stimulates homologous recombination in mammalian cells. Proceedings of the National Academy of Sciences of the United States of America 91, 6064-6068 (1994).

9. Bibikova, M. et al. Stimulation of homologous recombination through targeted cleavage by chimeric nucleases. Molecular and cellular biology 21, 289-297 (2001).

10. Cermak, T. et al. Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting. Nucleic Acids Res 39, e82 (2011).

11. Miller, J.C. et al. A TALE nuclease architecture for efficient genome editing. Nature biotechnology 29, 143-148 (2011).

12. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821 (2012).

13. Ran, F.A. et al. Genome engineering using the CRISPR-Cas9 system. Nature protocols 8, 22812308 (2013).

14. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823 (2013).

15. Kim, Y.G., Cha, J. & Chandrasegaran, S. Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proceedings of the National Academy of Sciences of the United States of America 93, 1156-1160 (1996).

16. Smith, J. et al. Requirements for double-strand cleavage by chimeric restriction enzymes with

d cleavage

zinc finger DNA-recognition domains. Nucleic acids research 28, 3361-3369 (2000).

17. Boch, J. et al. Breaking the code of DNA binding specificity of TAL-type III effectors. Science

326, 1509-1512 (2009).

18. Bogdanove, A.J. & Voytas, D.F. TAL effectors: customizable proteins for DNA targeting. Science 333, 1843-1846 (2011).

19. Pavletich, N.P. & Pabo, C.O. Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A. Science 252, 809-817 (1991).

20. Pabo, C.O., Peisach, E. & Grant, R.A. Design and selection of novel Cys2His2 zinc finger proteins. Annual review of biochemistry 70, 313-340 (2001).

21. Rebar, E.J. & Pabo, C.O. Zinc finger phage: affinity selection of fingers with new DNA-binding specificities. Science 263, 671-673 (1994).

22. Jamieson, A.C., Wang, H. & Kim, S.H. A zinc finger directory for high-affinity DNA recognition. Proceedings of the National Academy of Sciences of the United States of America 93, 12834-12839 (1996).

23. Segal, D.J., Dreier, B., Beerli, R.R. & Barbas, C.F., 3rd Toward controlling gene expression at will: selection and design of zinc finger domains recognizing each of the 5'-GNN-3' DNA target sequences. Proceedings of the National Academy of Sciences of the United States of America 96, 2758-2763 (1999).

24. Choo, Y. & Klug, A. Toward a code for the interactions of zinc fingers with DNA: selection of randomized fingers displayed on phage. Proceedings of the National Academy of Sciences of the United States of America 91, 11163-11167 (1994).

25. Porteus, M.H. & Baltimore, D. Chimeric nucleases stimulate gene targeting in human cells.

Science 300, 763 (2003).

26. Urnov, F.D. et al. Highly efficient endogenous human gene correction using designed zinc-finger nucleases. Nature 435, 646-651 (2005)^

27. Isalan, M., Choo, Y. & Klug, A. Synergy between adjacent zinc fingers in sequence-specific

DNA recognition. Proceedings of the National Academy of Sciences of the United States of America 94, 5617-5621 (1997).

28. Ramirez, C.L. et al. Unexpected failure rates for modular assembly of engineered zinc fingers. Nature methods 5, 374-375 (2008).

29. Sander, J.D. et al. Selection-free zinc-finger-nuclease engineering by context-dependent assembly (CoDA). Nature methods 8, 67-69 (2011).

30. Mandell, J.G. & Barbas, C.F., 3rd Zinc Finger Tools: custom DNA-binding domains for transcription factors and nucleases. Nucleic acids research 34, W516-523 (2006).

31. Sander, J.D., Zaback, P., Joung, J.K., Voytas, D.F. & Dobbs, D. Zinc Finger Targeter (ZiFiT): an engineered zinc finger/target site design tool. Nucleic acids research 35, W599-605 (2007).

32. Maeder, M.L. et al. Rapid "open-source" engineering of customized zinc-finger nucleases for highly efficient gene modification. Molecular cell 31, 294-301 (2008).

33. Doyle, E.L. et al. TAL Effector-Nucleotide Targeter (TALE-NT) 2.0: tools for TAL effector design and target prediction. Nucleic acids research 40, W117-122 (2012).

34. Lin, Y., Cradick, T.J. & Bao, G. Designing and testing the activities of TAL effector nucleases. Methods in molecular biology 1114, 203-219 (2014).

35. Montague, T.G., Cruz, J.M., Gagnon, J.A., Church, G.M. & Valen, E. CHOPCHOP: a

CRISPR/Cas9 and TALEN web tool for genome editing. Nucleic acids research 42, W401-407 (2014). ^ ^^

36. Neff, K.L. et al. Mojo Hand, a TALEN design tool for genome editing applications. BMC bioinformatics 14, 1 (2013).

37. Reyon, D. et al. FLASH assembly of TALENs for high-throughput genome editing. Nature biotechnology 30, 460-465 (2012).

38. Heigwer, F. et al. E-TALEN: a web tool to design TALENs for genome engineering. Nucleic

acids research 41, e190 (2013).

39. Lin, Y. et al. SAPTA: a new design tool for improving TALE nuclease activity. Nucleic acids research 42, e47 (2014).

40. Christian, M.L. et al. Targeting G with TAL Effectors: A Comparison of Activities of TALENs Constructed with NN and NK Repeat Variable Di-Residues. PloSone 7, e45383 (2012).

41. Doench, J.G. et al. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat Biotechnol 32, 1262-1267 (2014).

42. Chari, R., Mali, P., Moosburner, M. & Church, G.M. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat Meth Published online 13 July 2015 (2015).

43. Moreno-Mateos, M.A. et al. CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nature methods (2015).

44. Szczepek, M. et al. Structure-based redesign of the dimerization interface reduces the toxicity of zinc-finger nucleases. Nature biotechnology 25, 786-793 (2007).

45. Mussolino, C. et al. A novel TALE nuclease scaffold enables high genome editing activity in combination with low toxicity. Nucleic acids research 39, 9283-9293 (2011).

46. Fine, E.J., Cradick, T.J., Zhao, C.L., Lin, Y. & Bao, G. An online bioinformatics tool predicts zinc finger and TALE nuclease off-target cleavage. Nucleic acids research 42, e42 (2014).

47. Pattanayak, V., Ramirez, C.L., Joung, J.K. & Liu, D.R. Revealing off-target cleavage specificities of zinc-finger nucleases by in vitro selection. Nature methods 8, 765-770 (2011).

48. Cradick, T.J., Qui, P., Lee, C.M., Fine, E.J. & Bao, G. COSMID: A Web-based Tool for Identifying and Validating CRISPR/Cas Off-targe t Sites. Molecular Therapy-Nucleic Acids, e214 (2014).

49. Perez, E.E. et al. Establishment of HIV-1 resistance in CD4+ T cells by genome editing using

zinc-finger nucleases. Nature biotechnology 26, 808-816 (2008).

50. Gabriel, R. et al. An unbiased genome-wide analysis of zinc-finger nuclease specificity. Nature biotechnology 29, 816-823 (2011).

51. Sander, J.D. et al. In silico abstraction of zinc finger nuclease cleavage profiles reveals an expanded landscape of off-target sites. Nucleic acids research 41, e181 (2013).

52. Hsu, P.D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nature biotechnology 31, 827-832 (2013).

53. Cradick, T.J., Fine, E.J., Antico, C.J. & Bao, G. CRISPR/Cas9 systems targeting beta-globin and CCR5 genes have substantial off-target activity. Nucleic Acids Res 41, 9584-9592 (2013).

54. Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat Biotechnol 31, 822-826 (2013).

55. Tsai, S.Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nature biotechnology 33, 187-197 (2015).

56. Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res 42, 7473-7485 (2014).

57. Bae, S., Park, J. & Kim, J.S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473-1475 (2014).

58. Singh, R., Kuscu, C., Quinlan, A., Qi, Y. & Adli, M. Cas9-chromatin binding information enables more accurate CRISPR off-target prediction. Nucleic acids research (2015).

59. Kim, D. et al. Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nature methods 12, 237-243 (2015).

60. Wang, X. et al. Unbiased detection of off-target cleavage by CRISPR-Cas9 and TALENs using

integrase-defective lentiviral vectors. Nature biotechnology 33, 175-178 (2015). 61. Kuscu, C., Arslan, S., Singh, R., Thorpe, J. & Adli, M. Genome-wide analysis reveals

characteristics of off-target sites bound by the Cas9 endonuclease. Nature biotechnology 32, 677-

683 (2014).

62. Qiu, P. et al. Mutation detection using Surveyor nuclease. BioTechniques 36, 702-707 (2004).

63. Guschin, D.Y. et al. A rapid and general assay for monitoring endogenous gene modification. Methods in molecular biology 649, 247-256 (2010).

64. Brinkman, E.K., Chen, T., Amendola, M. & van Steensel, B. Easy quantitative assessment of genome editing by sequence trace decomposition. Nucleic acids research 42, e168 (2014).

65. Hendel, A. et al. Quantifying genome-editing outcomes at endogenous loci with SMRT sequencing. Cell reports 7, 293-305 (2014).

66. Hill, J.T. et al. Poly peak parser: Method and software for identification of unknown indels using sanger sequencing of polymerase chain reaction products. DevDyn 243, 1632-1636 (2014).

67. Kim, H.J., Lee, H.J., Kim, H., Cho, S.W. & Kim, J.S. Targeted genome editing in human cells with zinc finger nucleases constructed via modular assembly. Genome research 19, 1279-1288 (2009).

68. Huang, P. et al. Heritable gene targeting in zebrafish using customized TALENs. Nature biotechnology 29, 699-700 (2011).

69. Kim, Y., Kweon, J. & Kim, J.S. TALENs and ZFNs are associated with different mutation signatures. Nature methods 10, 185 (2013).

70. Chen, S. et al. A large-scale in vivo analysis reveals that TALENs are significantly more mutagenic than ZFNs generated using context-dependent assembly. Nucleic acids research 41, 2769-2778 (2013).

71. Kim, H. et al. Surrogate reporters for enrichment of cells with nuclease-induced mutations.

of cells 1

Nature methods 8, 941-943 (2011). 72. Dahlem, T.J. et al. Simple methods for generating and detecting locus-specific mutations induced

with TALENs in the zebrafish genome. PLoS genetics 8, e1002861 (2012).

73. Kim, J.M., Kim, D., Kim, S. & Kim, J.S. Genotyping with CRISPR-Cas-derived RNA-guided endonucleases. Nature communications 5, 3157 (2014).

74. Crosetto, N. et al. Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing. Nature methods 10, 361-365 (2013).

75. Guell, M., Yang, L. & Church, G.M. Genome editing assessment using CRISPR Genome Analyzer (CRISPR-GA). Bioinformatics 30, 2968-2970 (2014).

76. Mussolino, C. et al. TALENs facilitate targeted genome editing in human cells with high specificity and low cytotoxicity. Nucleic acids research 42, 6762-6773 (2014).

77. Lee, H.J., Kim, E. & Kim, J.S. Targeted chromosomal deletions in human cells using zinc finger nucleases. Genome research 20, 81-89 (2010).

78. Lee, H.J., Kweon, J., Kim, E., Kim, S. & Kim, J.S. Targeted chromosomal duplications and inversions in the human genome using zinc finger nucleases. Genome research 22, 539-548 (2012).

79. Brunet, E. et al. Chromosomal translocations induced at specified loci in human stem cells. Proceedings of the National Academy of Sciences of the United States of America 106, 1062010625 (2009).

80. Maddalo, D. et al. In vivo engineering of oncogenic chromosomal rearrangements with the CRISPR/Cas9 system. Nature 516, 423-427 (2014).

81. Tesson, L. et al. Knockout rats generated by embryo microinjection of TALENs. Nature biotechnology 29, 695-696 (2011).

82. Hockemeyer, D. et al. Genetic engineering of human pluripotent cells using TALE nucleases.

Nature biotechnology 29, 731-734 (2011).

83. Gupta, A., Meng, X., Zhu, L.J., Lawson, N.D. & Wolfe, S.A. Zinc finger protein-dependent and -independent contributions to the in vivo off-target activity of zinc finger nucleases. Nucleic acids research 39, 381-392 (2011).

84. Wu, X. et al. Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nat Biotechnol 32, 670-676 (2014).

85. O'Geen, H., Henry, I.M., Bhakta, M.S., Meckler, J.F. & Segal, D.J. A genome-wide analysis of Cas9 binding specificity using ChIP-seq and targeted sequence capture. Nucleic acids research 43, 3389-3404 (2015).

86. Frock, R.L. et al. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nature biotechnology 33, 179-186 (2015).

87. Ran, F.A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186-191 (2015).

88. Miller, J.C. et al. An improved zinc-finger nuclease architecture for highly specific genome editing. Nature biotechnology 25, 778-785 (2007).

89. Doyon, Y. et al. Enhancing zinc-finger-nuclease activity with improved obligate heterodimeric architectures. Nature methods 8, 74-79 (2011).

90. Lee, C.M., Flynn, R., Hollywood, J.A., Scallan, M.F. & Harrison, P.T. Correction of the AF508 Mutation in the Cystic Fibrosis Transmembrane Conductance Regulator Gene by Zinc-Finger Nuclease Homology-Directed Repair. BioResearch Open Access 1, 99-108 (2012).

91. Ramirez, C.L. et al. Engineered zinc finger nickases induce homology-directed repair with reduced mutagenic effects. Nucleic acids research 40, 5560-5568 (2012).

92. Liu, X. et al. Zinc-finger nickase-mediated insertion of the lysostaphin gene into the beta-casein locus in cloned cows. Nature communications 4, 2565 (2013).

93. Wang, J. et al. Targeted gene addition to a predetermined site in the human genome using a

ZFN-based nicking enzyme. Genome research 22, 1316-1326 (2012).

94. Wu, Y. et al. TALE nickase mediates high efficient targeted transgene integration at the human multi-copy ribosomal DNA locus. Biochemical and biophysical research communications 446, 261-266 (2014).

95. Wu, H. et al. TALE nickase-mediated SP110 knockin endows cattle with increased resistance to tuberculosis. Proceedings of the National Academy of Sciences of the United States of America 112, E1530-1539 (2015).

96. Mali, P. et al. CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering. Nature biotechnology 31, 833-838 (2013).

97. Ran, F.A. et al. Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell 154, 1380-1389 (2013).

98. Duda, K. et al. High-efficiency genome editing via 2A-coupled co-expression of fluorescent proteins and zinc finger nucleases or CRISPR/Cas9 nickase pairs. Nucleic Acids Res 42, e84 (2014).

99. Tsai, S.Q. et al. Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nat Biotechnol 32, 569-576 (2014).

100. Guilinger, J.P., Thompson, D.B. & Liu, D.R. Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nature biotechnology 32, 577-582 (2014).

101. Aouida, M. et al. Efficient fdCas9 Synthetic Endonuclease with Improved Specificity for Precise Genome Engineering. PloS one 10, e0133373 (2015).

102. Arribere, J.A. et al. Efficient marker-free recovery of custom genetic modifications with

CRISPR/Cas9 in Caenorhabditis elegans. Genetics 198, 837-846 (2014).

103. Fonfara, I. et al. Phylogeny of Cas9 determines functional exchangeability of dual-RNA and Cas9 among orthologous type II CRISPR-Cas systems. Nucleic acids research 42, 2577-2590 (2014).

104. Esvelt, K.M. et al. Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nature methods 10, 1116-1121 (2013).

105. Hou, Z. et al. Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis. Proceedings of the National Academy of Sciences of the United States of America 110, 15644-15649 (2013).

106. Swenberg, J.A. et al. Endogenous versus exogenous DNA adducts: their role in carcinogenesis, epidemiology, and risk assessment. Toxicological sciences : an official journal of the Society of Toxicology 120 Suppl 1, S130-145 (2011).

107. Dewey, F.E. et al. Clinical interpretation and implications of whole-genome sequencing. Jama 311, 1035-1045 (2014).

108. Lomax, M.E., Folkes, L.K. & O'Neill, P. Biological consequences of radiation-induced DNA damage: relevance to radiotherapy. Clinical oncology 25, 578-585 (2013).

109. Stemmer, M., Thumberger, T., Del Sol Keyer, M., Wittbrodt, J. & Mateo, J.L. CCTop: An Intuitive, Flexible and Reliable CRISPR/Cas9 Target Prediction Tool. PloS one 10, e0124633 (2015).

110. Prykhozhij, S.V., Rajan, V., Gaston, D. & Berman, J.N. CRISPR multitargeter: a web tool to find common and unique CRISPR single guide RNA targets in a set of similar sequences. PloS one 10, e0119372 (2015).

111. Naito, Y., Hino, K., Bono, H. & Ui-Tei, K. CRISPRdirect: software for designing CRISPR/Cas guide RNA with reduced off-target sites. Bioinformatics 31, 1120-1123 (2015).

112. Grissa, I., Vergnaud, G. & Pourcel, C. CRISPRFinder: a web tool to identify clustered regularly

interspaced short palindromic repeats. Nucleic acids research 35, W52-57 (2007).

113. Zhu, L.J., Holmes, B.R., Aronin, N. & Brodsky, M.H. CRISPRseek: a bioconductor package to identify target-specific guide RNAs for CRISPR-Cas9 genome-editing systems. PloS one 9,

e108424 (2014).

114. Heigwer, F., Kerr, G. & Boutros, M. E-CRISP: fast CRISPR target site identification. Nature methods 11, 122-123 (2014).

115. Gratz, S.J. et al. Highly specific and efficient CRISPR/Cas9-catalyzed homology-directed repair in Drosophila. Genetics 196, 961-971 (2014).

116. O'Brien, A. & Bailey, T.L. GT-Scan: Identifying unique genomic targets. Bioinformatics 30, 2673-2675 (2014).

117. Bae, S., Kweon, J., Kim, H.S. & Kim, J.S. Microhomology-based choice of Cas9 nuclease target sites. Nature methods 11, 705-706 (2014).

118. Xie, S., Shen, B., Zhang, C., Huang, X. & Zhang, Y. sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites. PloS one 9, e100448 (2014).

119. Hodgkins, A. et al. WGE: a CRISPR database for genome engineering. Bioinformatics Published online 2015 May 14 (2015).

120. Grau, J., Boch, J. & Posch, S. TALENoffer: genome-wide TALEN off-target prediction. Bioinformatics 29, 2931-2932 (2013).

121. Xiao, A. et al. EENdb: a database and knowledge base of ZFNs and TALENs for endonuclease

—1—

FIGURE CAPTIONS

Figure 1. Classes of designer nucleases and gene editing outcomes. Targeted DSBs can be induced using ZFNs, TALENs, or CRISPR/Cas9. DNA breaks are repaired via endogenous cellular repair pathways, such as NHEJ and HDR. The NHEJ pathway results in short deletions or insertions at the target site that can result in a targeted gene knock-out. The HDR pathway is a high-fidelity repair pathway that uses a donor DNA sequence as a template to correct the DNA break. This pathway can be exploited to repair mutations or modify DNA at the resolution of a single nucleotide. Figure 2. Comparison of off-target analysis by different methods. (a) The 38 heterodimeric bona fide

off-target sites for CCR5 ZFNs found by four different experiment-based prediction methods and the refined 'ZFN v2.0' PROGNOS algorithm. The PROGNOS sites are drawn from the top rankings spanning 3X the number of predictions by the Bayesian abstraction of the in vitro cleavage profile. (**)

were de

bstrac

Note that only six of the sites found using ChIP-Seq were described , so the full degree of overlap of all

ChIP-Seq sites with sites found by other methods remains unknown. (Adopted from ) (b) A comparison

of the off-target predictions by the MIT CRISPR Design Tool (solely bioinformatics-based) to the bona fide off-target sites found for 9 different RGENs by the GUIDE-Seq method (experimental-based). (c) A comparison analogous to (b) but using the E-CRISP bioinformatics-based prediction tool. GUIDE-Seq figures adopted from55.

Figure 3. Gross chromosomal rearrangements as a consequence of genome editing. Multiplex gene targeting can result in targeted large deletions, inversions, or translocations. However, these gross chromosomal rearrangements can also occur between nuclease on- and off-target sites. Cut sites represented by red arrows.

Figure 4. Outline of various methods for off-target site identification and validation. In silico prediction tools identify potential off-target (OT) sites that can be analyzed by next generation sequencing (NGS). There are various experimental methods designed to identify OT sites in an unbiased manner. After OT site identification, a second round of NGS at these sites is required to verify if they are bona fide OT sites.

Figure 5. Strategies to reduce off-target events. A. Modification of the FokI domain to prevent homodimerization of ZFN or TALEN monomers. B. Modification of the Cas9 nuclease to generate a nicking version of Cas9 (Cas9N). Cas9N can generate single stranded DNA breaks. C. Inactivation of the Cas9 endonuclease to create a dead Cas9. Fusion of the FokI domain creates a dCas9-FokI enzyme that requires a pair of dCas9-FokI to achieve dimerization of the FokI domain for DNA cleavage. D. Cas9 orthologs with longer PAM sequences can result in less potential off-target sites in the genome.

TABLES Table 1. Nuclease design tools

Design Tool Nuclease Design OffTarget Search Max # of Mismatc h Activity Analysis Functionality

CRISPR/Cas9 Tools

Benchling https://benchling.com V V 4 X Web tools for molecular biology and CRISPR applications, including gRNA design for a gene or genome coordinates. Automatically annotate with the exon and CDS information.

Cas-OFFinder http://www.rgenome.net/c as-offinder/ X V 9 X Web site or downloadable program that searches for potential off-target sites for input gRNA and PAMs in the requested genome or user-defined sequence5 . Search includes DNA or RNA bulges in addition to base mismatches.

CCTop http://crispr.cos.uni-heidelberg.de V V 5 CRISPR/Cas9 Target online predictor evaluates target sites within an input X sequence giving a ranking of the candidates based on the possible offtarget sites it identifies in a number of 109 genomes .

CHOPCHOP https://chopchop.rc.fas.h arvard.edu V V *<2 F " X A web tool for selecting target sites for CRISPR/Cas9- or TALEN-directed targeting from within an input sequence or chromosomal coordinates in a number of genomes. Locates off-target sites with up to two mismatches35.

COSMID

https://crispr.bme.gatecl edu

CRISPR Off-target Sites with Mismatches, Insertions, and Deletions. An exhaustive search tool identifying sites matching user X supplied criteria that can include DNA or RNA bulges in addition to base mismatches. Also designs primers for downstream applications, such as NGS48.

CRISPR design

http://crispr.mit.edu

Designs gRNA by evaluating possible off-target sites in a number of X genomes and outputs with scoring, highlighting gRNA that may have higher specificity52.

CRISPR

MultiTargeter V

http://www.multicrispr.net

Users can input sequences to search for unique CRISPR target sites or X highly similar or identical target sites in multiple genes or transcripts or within a single gene110.

CRISPR Scan http://www.crisprscan.org

Prediction

Finds and predicts the cutting efficiency of gRNAs in a given sequence or protein coding gene. Also has UCSC tracks for high and low scoring gRNAs43

CRISPR-Plant

http://www.genome.arizo

na.edu/crispr/

CRISPRsearch.html

A CRISPR search tool targeting a range of plant species, identifying spacer X sequences located in a selected gene

region, gene or larger chromosomal region.

CRISPRdirect

http://crispr.dbcls.jp

Selects target sites from accession numbers, genome locations or input X sequences. Outputs the expected

number of genomic sites with 8 or 12 bp matching seed sequences111.

CRISPRfinder

http://crispr. u-psud.fr/ Server/

Locates CRISPR (repeat) structures in published microbial genomes or in X small to very large user-submitted

sequences, also can extract the

spacer sequences .

CRISPRseek

http://www.bioconductor. org/packages/release/bio c/html/CRISPRseek.html

Open source software to identify target-specific gRNAs, minimizing offtarget cleavage at other sites within any selected genome outputs a list of all possible sgRNAs for a given PAM sequence.113.

CROP-IT

http://cheetah.bioch.virgi nia.edu/AdliLab/CROP-IT/ homepage.html

CRISPR/Cas9 Off-target Prediction and Identification Tool assists biologists in X designing CRISPR/CAS9. Predicts and

ranks off-target sites in mouse or and human genome.

DESKGEN

https://www.deskgen.co m/landing/

Dharmacon Configurator

http://dharmacon.gelife: iences.com/gene-editing/crispr-rna-configurator/

Allows designing gene editing experiments through a full range of genome-editing X tools, including providing gRNAs, vectors,

donors, and target activity scores and library designs.

Designs gRNA targeting sequences based on Entrez Gene ID or Gene symbol, with links to order synthesized RNAs. Also evaluates entered gRNAs.

DNA 2.0 Design Tool

https://www.dna20.com/e Commerce/cas9/input

Allows testing user supplied gRNAs or designs gRNA or pairs of gRNAs targeting X a user-specified gene or chromosomal site.

gRNAs are evaluated using scoring system.

E-CRISP

http://www.e-crisp.org/E- V CRISP/

Designs and evaluates CRISPR target sites from a gene symbol, an ENSEMBL ID or a FASTA sequence and designs gRNAs, allowing for a number of different CRISPR uses114.

flyCRISPR Optimal Target Finder

http://tools.flycrispr.molbi o.wisc.edu/targetFinder/

Identifies target sites and evaluates their specificity for use targeting X Drosophila melanogaster. Off-target sites evaluated using empirically rooted rules 115.

Genome Target Scan ranks all GT-Scan potential targets in a er-selected

http://gt-scan.braembl. V V 3 X region of a genome in terms of how

org.au/gt-scan/ many putative off-target sites they

have, to aid target site selection1 6.

Jack Lin's gRNA finder

http://spot.colorado.edu/-slin/cas9.html

CRISPR target site locator that searches the sense strand of DNA sequences for X PAM sequences and returns the sites

target sites of a specified length. Also allows checking for secondary structure.

Microhomology-associated Score Calculator

http://www.rgenome.net/ mich-calculator/

Evaluates microhomology in input sequences to evaluate potential for induced frame shifts and knockouts. A scoring system estimates microhomology-associated deletions at nuclease target sites117.

sgRNAcas9

http://www.biootools.com

A downloadable program software package that designs CRISPR gRNAs, provides cloning sequences X with minimized off-target effects and searches for potential off-target cleavage. Designs PCR primers _flanking on- or off-target sites1

sgRNA Designer

http://www.broadinstitute.

org/rnai/public/analysis-

tools/sgrna-design

Prediction

A web-based tool for designing sgRNAs targeting the human or mouse gene. This tool provides guidance as to which sgRNAs are most likely to give high on-target activity, but is not intended for any offtarget predictions41._

sgRNA Scorer

https://crispr.med.harvard .edu/sgRNAScorer/

/a Prediction

A sgRNA design tool that was developed using results from an in vivo library-on-library methodology that simultaneously assessed sgRNA activity across ~1,400 genomic loci42.

http://www.sanger.ac.uk/ htgt/wge/

V V 4 X

A series of tools using a database of determined single and paired guide strands, including a genome browser environment and pre-scored off-target information119.

ZFN and TALEN Tools

PROGNOS

http://bao.rice.edu/Resea rch/BioinformaticTools/pr ognos.html

Predicted Report Of Genome-wide Nuclease Off-target Sites exhaustively identifies off-target sites for TALENs X or ZFNs, provides a ranked list and designs primers for downstream applications, such as mutation detection or NGS46.

http://bao.rice.edu/Resea

rch/BioinformaticTools/T

AL_targeter.html

Scoring Algorithm for Predicting TALE(N) Activity scans an input DNA sequence and designs TAL effectors or TALENs predicted to have the highest activity, particularly when using the di-residues NK that provide more specifically within a supplied gene sequence 9

TAL Plasmids

Facilitates cloning and sequencing of TAL and TALEN plasmids, by generating vector

Sequence Assembly Tool

http://bao.rice.edu/Resea rch/BioinformaticTools/as sembleTALSequences.ht ml

sequence after a user enters target sequences or RVDs and assembly method.

TALE-N Effector Nuclease Targeter

https://tale-nt.cac.cornell.edu

Tools for designing, evaluating, and assembling custom single and paired X TAL effector constructs and TALENs. Includes off-target search and counting 33.

TALgetter / TALENoffer

http://galaxy2.informatik. uni-halle.de:8976

Predicts off-target sites for TALEN sites identified in an input sequence. Available as web site, command line program and as a Galaxy server120

Zinc Finger Targeter originally a tool for

http://zifit.partners.org/Zi V X blast X (ZFN) sites, but ZiFit has been extended to

FiT/ also contain tools for TALENs and

CRISPR nucleases31, 126

Other Genome Editing Resources

CRISPR-GA

http://54.80.152.219

Web-based analysis of CRISPR-Cas9 genome editing outcomes to quantify cation and characterize the indels and

mutations through single or paired read NGS75.

CRISPResso

http://crispresso. rocks

http://eendb.zfgenetics.or

Provides analysis of CRISPR-Cas9 genome editing outcomes (NHEJ or HDR) Quantification from NGS single or paired reads when compared to uploaded reference sequence.

Databases of target sites for reported TALENs, ZFNs and CRISPR/Cas X systems in different organisms. Also lists methods and uses for gene editing121.

Gibson Designer

http://www.sanger.ac.uk/ htgt/wge/gibson_designe

Designs human or mouse amplification primers to create targeting vectors by Gibson assembly. Gibson Designer matches vector design with CRISPR sites for the creation of exon deletions119.

http://tide.nki.nl

Quantification

Tracking of Indels by Decomposition (TIDE) is a web tool to determine the level and type of mutations introduced by genome editing. Compares Sanger sequencing reactions64.

Table 2. Comparison of COSMID with other available tools in predicting off-target sitesa

Off-target Sites Sequence # of Mismatch Bulge Bulge Position % Activityc COSMID

R01_ _OT2 AGGAACATGGATGAAGTTGGAGG 2 n.a. n.a. 43.63 V

R01_ _OT11 GTGAACGTGGATGCAGTTGGTGG 1 n.a. n.a. 27.00 V

R01_ _OT10 GTGAAAATGGATGAAGTTGGAGG 2 n.a. n.a. 23.39 V

R01_ _OT1 AGGAACATGGATGAAGTTGGAGG 2 n.a. n.a. 21 .76 V

R01_ _OT5 GGGAACATGGATGAAGTTGGAGG 2 n.a. n.a. 1 5.93 V

R01_ _OT7 AGGAACGTGGATGGAGTTGGAGG 2 n.a. n.a. 1 2.90 V

R01_ _OT4 GGGAACATGGATGAAGTTGGAGG 2 n.a. n.a. 1 0.84 V

R01_ _OT8 AGGAACGTGGATGAAGCTGGAGG 2 n.a. n.a. 6.65 V

R01_ _OT6 AGGAACGTGGATGGAGTTGGAGG 2 n.a. n.a. 2.70 V

R30_ Ins9 GAAGAGGGGAGGCAGGAGGGCAGG 2 DNA 4/3/2 1 .2 V

R30_ Dell AGA-AGCGGAGGCAGGAGGCTGG 2 RNA 17 0.62 V

R30_ Ins8 GAAGAGAGGAGGCAGGAGGGCTGG 2 DNA 4/3/2 1 .99 V

R01_ Dell GGGAAT-TGGATGAAGTTGGGGG 2 RNA 15/14 0.7 V

R30_ Ins14 GGAGAGCGGCGGCAGGAGGCGTAG 2 DNA 1 0.4 V

R30_ Ins7 GAAGAGTGGAGGCAGGGAGGCTGG 2 DNA 7/6/5 0.25 V

R30_ Ins10 GCAGAGCCGAGAGCAGGAGGCGAG 2 DNA 10 0.1 9 V

R30_ Ins4 GGAGAGCGGGGGCCAGGAGGCCGG 2 DNA 9/8 0.1 7 V

R30_ Del10 AGAGAGAGGA-GCAGGAGGCTGG 3 RNA 10/9 0.08 V

R01_ Ins1 AGGAACGTGGATGAACTTGGAAGG 3 DNA 1 0.06 V

Cas-Offinder

CROP-IT

CHOP CHOP

CRISPR Design

Indel / Alternate Model

Alternate PAM 4 mismathces

ata adapted from 39' 48 for

guide strands R-01 and R30. Off-target sites found by a particular tool are indicated with a V and those not identified by that tool are indicated with a dash in a grey box.

Groups of sites with matching

sequences (at positions 1-19) have their names in bold with matching colors. Indel activity for off-target sites containing a DNA or RNA bulge was measured using deep sequencing.

target site and off-target sites OT1-OT11 are listed by decri detection limit.

ecreasin

hing co

The cleavage rates at R-01 on-7EI activity. OT3 and OT9 had activities below T7E1

Table 3. gRNA Design Overview

Step Design Tool or Extsorimental Method

Identify all potential gRNA binding sites at the target locus Benchling, CRISPR SCAN, CRISPR-Plant

Screen all gRNAs for potential off-target sites using in silrco prediction tools Cas-OFFinder, COSMID, DESKGEN

Test short list of gRNAs in an appropriate cell line T7EI, TIDE, RFLP

Further screen top candidates for off-target activity using an appropriate method BLESS, Guide-Seq (cell line dependent), targeted deep sequencing (Cas-OFFinder, COSMID, DESKGEN)

Figure 1