A Capsule Network-Based Hybrid Deep Learning Model For Efficient Prediction Of Crispr-Cas9 Off-Target Effects

Main Article Content

Hamsika Chakilam*
Kishore Kumar T
Tekyam Krishna Kumar Naidu

Abstract

CRISPR-Cas9 genome editing has transformed biomedical research and the development of therapies, yet the challenge of unintended off-target effects remains a significant obstacle to its clinical use. In this study, we present a new deep learning model that combines Capsule Networks with Transformer blocks, bidirectional LSTM layers, and CNNs. The model is further strengthened by incorporating k-mer encoded sequence features and biological rule checks to predict CRISPR-Cas9 off-target activity with greater accuracy. It processes guide and off-target DNA sequences through a hybrid pipeline, which includes convolution, temporal modelling, attention-based representation learning, and spatial hierarchy encoding using capsule layers. At the same time, the model extracts and analyses numerical features, such as mismatch counts, GC content, and PAM motif patterns. To improve reliability, we introduce a biological constraint layer that filters predictions based on well-established domain knowledge. The final predictions result from integrating these various feature representations. Our results show that this biologically-informed architecture significantly enhances both sensitivity and specificity in off-target prediction, indicating its potential to improve the safety and design of CRISPR experiments.

Article Details

Section
Articles

References

Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J.A. and Charpentier, E., 2012. A programmable dual-RNA–guided DNA endonuclease in adaptive bacterial immunity. science, 337(6096), pp.816-821.

Hsu, P.D., Scott, D.A., Weinstein, J.A., Ran, F.A., Konermann, S., Agarwala, V., Li, Y., Fine, E.J., Wu, X., Shalem, O. and Cradick, T.J., 2013. DNA targeting specificity of RNA-guided Cas9 nucleases. Nature biotechnology, 31(9), pp.827-832.

Fu, Y., Foden, J.A., Khayter, C., Maeder, M.L., Reyon, D., Joung, J.K. and Sander, J.D., 2013. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nature biotechnology, 31(9), pp.822-826.

Zhang, X.H., Tee, L.Y., Wang, X.G., Huang, Q.S. and Yang, S.H., 2015. Off-target effects in CRISPR/Cas9-mediated genome engineering. Molecular therapy Nucleic acids, 4.

Yeh, C.D., Richardson, C.D. and Corn, J.E., 2019. Advances in genome editing through control of DNA repair pathways. Nature cell biology, 21(12), pp.1468-1478

K. Vinay Kumar, Sumanaswini Palakurthy, Sri Harsha Balijadaddanala, Sharmila Reddy Pappula, & Anil Kumar Lavudya. 2024. Early Detection and Diagnosis of Oral Cancer Using Deep Neural Network. Journal of Computer Allied Intelligence, 2(2), pp.22-34.

Kosicki, M., Tomberg, K. and Bradley, A., 2018. Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements. Nature biotechnology, 36(8), pp.765-771.

Tsai, S.Q., Zheng, Z., Nguyen, N.T., Liebers, M., Topkar, V.V., Thapar, V., Wyvekens, N., Khayter, C., Iafrate, A.J., Le, L.P. and Aryee, M.J., 2015. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nature biotechnology, 33(2), pp.187-197.

Swapna Saturi, & Arun Kumar Silivery. 2024. Computer Allied Intelligence in the Education Resource-Sharing Based inContract Deep Learning. Journal of Computer Allied Intelligence, 2(4), pp.51-69.

Wu, X., Scott, D.A., Kriz, A.J., Chiu, A.C., Hsu, P.D., Dadon, D.B., Cheng, A.W., Trevino, A.E., Konermann, S., Chen, S. and Jaenisch, R., 2014. Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nature biotechnology, 32(7), pp.670-676

Stemmer, M., Thumberger, T., del Sol Keyer, M., Wittbrodt, J. and Mateo, J.L., 2015. CCTop: an intuitive, flexible and reliable CRISPR/Cas9 target prediction tool. PloS one, 10(4), p.e0124633.

Labun, K., Montague, T.G., Gagnon, J.A., Thyme, S.B. and Valen, E., 2016. CHOPCHOP v2: a web tool for the next generation of CRISPR genome engineering. Nucleic acids research, 44(W1), pp.W272-W276.

Srinivasa Sai Abhijit Challapalli. 2024. Optimizing Dallas-Fort Worth Bus Transportation System Using Any Logic. Journal of Sensors, IoT & Health Sciences, 2(4), pp.40-55.

Tsai, S.Q., Zheng, Z., Nguyen, N.T., Liebers, M., Topkar, V.V., Thapar, V., Wyvekens, N., Khayter, C., Iafrate, A.J., Le, L.P. and Aryee, M.J., 2015. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nature biotechnology, 33(2), pp.187-197.

Tsai, S.Q., Nguyen, N.T., Malagon-Lopez, J., Topkar, V.V., Aryee, M.J. and Joung, J.K., 2017. CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR–Cas9 nuclease off-targets. Nature methods, 14(6), pp.607-614

Chuai, G., Ma, H., Yan, J., Chen, M., Hong, N., Xue, D., Zhou, C., Zhu, C., Chen, K., Duan, B. and Gu, F., 2018. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome biology, 19, pp.1-18.

Srinivasa Sai Abhijit Challapalli. 2024. Sentiment Analysis of the Twitter Dataset for the Prediction of Sentiments. Journal of Sensors, IoT & Health Sciences, 2(4), pp.1-15.

Zhang, G., Dai, Z. and Dai, X., 2020. C-RNNCrispr: Prediction of CRISPR/Cas9 sgRNA activity using convolutional and recurrent neural networks. Computational and structural biotechnology journal, 18, pp.344-354.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I., 2017. Attention is all you need. Advances in neural information processing systems, 30.

Park, S., Koh, Y., Jeon, H., Kim, H., Yeo, Y. and Kang, J., 2020. Enhancing the interpretability of transcription factor binding site prediction using attention mechanism. Scientific reports, 10(1), p.13413.

Zhou, J., Theesfeld, C.L., Yao, K., Chen, K.M., Wong, A.K. and Troyanskaya, O.G., 2018. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nature genetics, 50(8), pp.1171-1179

Koo, P.K. and Eddy, S.R., 2019. Representation learning of genomic sequence motifs with convolutional neural networks. PLoS computational biology, 15(12), p.e1007560.

Ng, P., 2017. dna2vec: Consistent vector representations of variable-length k-mers. arXiv preprint arXiv:1701.06279.

Sun, J., Guo, J. and Liu, J., 2024. CRISPR-M: Predicting sgRNA off-target effect using a multi-view deep learning network. PLOS Computational Biology, 20(3), p.e1011972.

Niu, R., Peng, J., Zhang, Z. and Shang, X., 2021. R-CRISPR: a deep learning network to predict off-target activities with mismatch, insertion and deletion in CRISPR-Cas9 system. Genes, 12(12), p.1878

Zhang, Z.R. and Jiang, Z.R., 2022. Effective use of sequence information to predict CRISPR-Cas9 off-target. Computational and structural biotechnology journal, 20, pp.650-661

Toufikuzzaman, M., Hassan Samee, M.A. and Sohel Rahman, M., 2024. CRISPR-DIPOFF: an interpretable deep learning approach for CRISPR Cas-9 off-target prediction. Briefings in Bioinformatics, 25(2), p.bbad530.