A Visual-Acoustic Modeling Framework for Robust Dysarthric Speech Recognition Using Synthetic Visual Augmentation and Transfer Learning
Main Article Content
Abstract
Article Details
References
Farneti, Daniele, Claudio Luzzatti, Arno Olthoff, Antonio Schindler, Rachel Zeng, Claudio Luzzatti, Antonio Schindler et al. "16 Basics of Acquired Motor Speech Disorders (Dysarthria, Dyspraxia)." In Phoniatrics III: Acquired Motor Speech and Language Disorders–Dysphagia–Phoniatrics and COVID-19, pp. 3-13. Cham: Springer Nature Switzerland, 2025.
Aiello, Edoardo Nicoló, Enrico Alfonsi, Mathieu Balaguer, Salvatore Biondi, Stefano Cappa, Giuseppe Cosentino, Mauro Fresia et al. "18 Diagnosis and Differential Diagnosis of Acquired Motor Speech Disorders (Dysarthria, Dyspraxia)." In Phoniatrics III: Acquired Motor Speech and Language Disorders–Dysphagia–Phoniatrics and COVID-19, pp. 31-100. Cham: Springer Nature Switzerland, 2025.
Liu, Yao, Faizahani binti Ab Rahman, and Farah binti Mohamad Zain. "A systematic literature review of research on automatic speech recognition in EFL pronunciation." Cogent Education 12, no. 1 (2025): 2466288.
Luo, Xiao, Le Zhou, Kathleen Adelgais, and Zhan Zhang. "Assessing the Effectiveness of Automatic Speech Recognition Technology in Emergency Medicine Settings: A Comparative Study of Four AI-powered Engines." Journal of Healthcare Informatics Research (2025): 1-19.
Kotte Vinay Kumar, Narasimha Reddy Soora, & N.C.Santoshkumar. (2023). Fundus Image Classification for the Early Detection of Issues in the DR for the Effective Disease Diagnosis. Journal of Computer Allied Intelligence, 1, no.1(2023): 27-40.
Bhat, Chitralekha, and Helmer Strik. "Speech Technology for Automatic Recognition and Assessment of Dysarthric Speech: An Overview." Journal of Speech, Language, and Hearing Research 68, no. 2 (2025): 547-577.
Showrov, Atif Ahmed, Md Tarek Aziz, Hadiur Rahman Nabil, Jamin Rahman Jim, Md Mohsin Kabir, M. F. Mridha, Nobuyoshi Asai, and Jungpil Shin. "Generative adversarial networks (GANs) in medical imaging: advancements, applications and challenges." IEEE Access (2024).
Rekha Gangula, & Dayakar Thalla. (2024). Machine Learning in Predicting Alzheimer’s Disease: Exploring Applications and Advancements. Journal of Computer Allied Intelligence, 2, no.1(2024): 1-7.
Liu, Yun, Xuechen Liu, Xiaoxiao Miao, and Junichi Yamagishi. "Libri2Vox Dataset: Target Speaker Extraction with Diverse Speaker Conditions and Synthetic Data." arXiv preprint arXiv:2412.12512 (2024).
S. R. Shahamiri and S. S. B. Salim, "Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach", Adv. Eng. Informat., vol. 28, no. 1, pp. 102-110, Jan. 2014.
H. Kim et al., "Dysarthric speech database for universal access research", Proc. 9th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), pp. 1741-1744, 2008.
Srinivasa Sai Abhijit Challapalli. Sentiment Analysis of the Twitter Dataset for the Prediction of Sentiments. Journal of Sensors, IoT & Health Sciences, 2, no.4 (2024): 1-15.
D. Ellis and N. Morgan, "Size matters: An empirical study of neural network training for large vocabulary continuous speech recognition", Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 2, pp. 1013-1016, Mar. 1999.
G. Saon and J.-T. Chien, "Large-vocabulary continuous speech recognition systems: A look at some recent advances", IEEE Signal Process. Mag., vol. 29, no. 6, pp. 18-33, Nov. 2012.
S. Sehgal and S. Cunningham, "Model adaptation and adaptive training for the recognition of dysarthric speech", Proc. SLPAT 6th Workshop Speech Lang. Process. Assistive Technol., pp. 65-71, 2015.
N. Rajeswari and S. Chandrakala, "Generative model-driven feature learning for dysarthric speech recognition", Biocybernetics Biomed. Eng., vol. 36, no. 4, pp. 553-561, 2016.
B. Vachhani, C. Bhat, B. Das and S. K. Kopparapu, "Deep autoencoder based speech features for improved dysarthric speech recognition", Proc. Interspeech, pp. 1854-1858, Aug. 2017.
B. Vachhani, C. Bhat and S. K. Kopparapu, "Data augmentation using healthy speech for dysarthric speech recognition", Proc. Interspeech, pp. 471-475, Sep. 2018.
K. Gurugubelli, A. K. Vuppala, N. P. Narendra and P. Alku, " Duration of the rhotic approximant , in spastic dysarthria of different severity levels ", Speech Commun., vol. 125, pp. 61-68, Dec. 2020.
Takahashi, Satoshi, Yusuke Sakaguchi, Nobuji Kouno, Ken Takasawa, Kenichi Ishizu, Yu Akagi, Rina Aoyama et al. "Comparison of vision transformers and convolutional neural networks in medical image analysis: a systematic review." Journal of Medical Systems 48, no. 1 (2024): 84.
Mzoughi, Hiba, Ines Njeh, Mohamed BenSlima, Nouha Farhat, and Chokri Mhiri. "Vision transformers (ViT) and deep convolutional neural network (D-CNN)-based models for MRI brain primary tumors images multi-classification supported by explainable artificial intelligence (XAI)." The Visual Computer (2024): 1-20.
Jian, Yueao, Peng Hu, Qihan Zhou, Nan Zhang, Deng’an Cai, Guangming Zhou, and Xinwei Wang. "A novel bidirectional LSTM network model for very high cycle random fatigue performance of CFRP composite thin plates." International Journal of Fatigue 190 (2025): 108627.
Venkateswarlu Chandu, Nkosingiphile Kunene, Sarah Motika, Peace Andrew John, & Regina Banda. Automated Pattern Estimation For Classification Of Consumer Perception On Green Banking. Journal of Computer Allied Intelligence, 2(2024): 79-93.
Srinivasa Sai Abhijit Challapalli. Optimizing Dallas-Fort Worth Bus Transportation System Using Any Logic. Journal of Sensors, IoT & Health Sciences, 2(2024): 40-55.
Aziz, Sumair, Muhammad Umar Khan, Adil Usman, Muhammad Faraz, Yazeed Yasin Ghadi, and Gabriel Axel Montes. "Bearing faults classification using novel log energy-based empirical mode decomposition and machine Mel-frequency cepstral coefficients." Digital Signal Processing 156 (2025): 104776.
Bhat, Chitralekha, and Helmer Strik. "Two-stage data augmentation for improved ASR performance for dysarthric speech." Computers in Biology and Medicine 189 (2025): 109954.
B.Ashok Kumar, K.Vijayachandra, G.Naveen Kumar, & V.N.Lakshmana Kumar. Blockchain Technology Communication Technology Model for the IoT. Journal of Computer Allied Intelligence, 2(4), 20-35, 2024.
Wang, Qianli, Zihan Zhong, Satwinder Singh, Clarion Mendes, Mark Hasegawa-Johnson, Waleed Abdulla, and Seyed Reza Shahamiri. "Dysarthric Speech Conformer: Adaptation for Sequence-to-Sequence Dysarthric Speech Recognition." In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5. IEEE, 2025.
Liu, Xianglong, Huilin Feng, Ying Wang, Danyang Li, and Kun Zhang. "Hybrid model of ResNet and Transformer for efficient image reconstruction of electromagnetic tomography." Flow Measurement and Instrumentation (2025): 102843.
Genç, Hasan, Canan Koç, Esra Yüzgeç Özdemir, and Fatih Özyurt. "An innovative approach to classify meniscus tears by reducing vision transformers features with elasticnet approach." The Journal of Supercomputing 81, no. 4 (2025): 1-29.
Sudo, Yui, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, and Shinji Watanabe. "Joint Beam Search Integrating CTC, Attention, and Transducer Decoders." IEEE Transactions on Audio, Speech and Language Processing (2025).
Ramani, D. Roja, Naveen Chandra Gowda, S. Sreejith, and Shrikant Tangade. "Deep Bidirectional LSTM for Emotion Detection through Mobile Sensor Analysis." Environmental Monitoring Using Artificial Intelligence (2025): 201-223.
Lai, ZhengLin, MengYao Liao, and Dong Xu. "Dynamic Bi-Elman Attention Networks (DBEAN): Dual-Directional Context-Aware Representation Learning for Enhanced Text Classification." arXiv preprint arXiv:2503.15469 (2025).
Mounnan, Oussama, Larbi Boubchir, Otman Manad, Abdelkrim El Mouatasim, and Boubaker Daachi. "DBAC-DSR-BT: A secure and reliable deep speech recognition based-distributed biometric access control scheme over blockchain technology." Computer Standards & Interfaces 92 (2025): 103929.
Bhat, Chitralekha, and Helmer Strik. "Two-stage data augmentation for improved ASR performance for dysarthric speech." Computers in Biology and Medicine 189 (2025): 109954.
Yang, Junxiao, Zhexin Zhang, Shiyao Cui, Hongning Wang, and Minlie Huang. "Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints." arXiv preprint arXiv:2503.01865 (2025)
T.Veeramakali, Syed Raffi Ahamed J, & Bagiyalakshmi N. Speech Signal Enhancement with Integrated Weighted Filtering for PSNR Reduction in Multimedia Applications. Journal of Computer Allied Intelligence, 2(3), 1-14, (2024).