Table of Contents
Original Research Articles
by Kishore Kumar Akula, Monica Akula, Alexander Gegov
Comput. Artif. Intell.
2024,
2(1);
doi:
36 Views,
21 PDF Downloads
We previously developed two AI-based medical automatic image classification tools using a multi-layer fuzzy approach (MFA and MCM) to convert image-based abnormality into a quantity. However, there is currently limited research on using diagnostic image assessment tools to statistically predict the hazard due to the disease. The present study introduces a novel approach that addresses a substantial research gap in the identification of hazard or risk associated with a disease using an automatically quantified image-based abnormality. The method employed to ascertain hazard in an image-based quantified abnormality was the cox proportional hazard (PH) model, a unique tool in medical research for identifying hazard related to covariates. MFA was first used to quantify the abnormality in CT scan images, and hazard plots were utilized to visually represent the hazard risk over time. Hazards corresponding to image-based abnormality were then computed for the variables, ‘gender,’ ‘age,’ and ‘smoking-status’. This integrated framework potentially minimizes false negatives, identifies patients with the highest mortality risk and facilitates timely initiation of treatment. By utilizing pre-existing patient images, this method could reduce the considerable costs associated with public health research and clinical trials. Furthermore, understanding the hazard posed by widespread global diseases like COVID-19 aids medical researchers in prompt decision-making regarding treatment and preventive measures. |
Original Research Articles
by Abdulaziz Alhowaish Luluh, Muniasamy Anandhavalli
Comput. Artif. Intell.
2024,
2(1);
doi:
14 Views,
7 PDF Downloads
Deep learning (DL) techniques which implement deep neural networks became popular due to the increase of high-performance computing facilities. DL achieves higher power and flexibility due to its ability to process many features when it deals with unstructured data. DL algorithm passes the data through several layers; each layer is capable of extracting features progressively and passes it to the next layer. Initial layers extract low-level features, and succeeding layers combine features to form a complete representation. This research attempts to utilize DL techniques for identifying sounds. The development in DL models has extensively covered classification and verification of objects through images. However, there have not been any notable findings concerning identification and verification of the voice of an individual from different other individuals using DL techniques. Hence, the proposed research aims to develop DL techniques capable of isolating the voice of an individual from a group of other sounds and classify them based on the use of convolutional neural networks models AlexNet and ResNet, that are used in voice identification. We achieved the classification accuracy of ResNet and AlexNet model for the problem of voice identification is 97.2039 % and 65.95% respectively, in which ResNet model achieves the best result. |
Original Research Articles
by Irshad Ahmad Thukroo, Rumaan Bashir, Kaiser Javeed Giri
Comput. Artif. Intell.
2024,
2(1);
doi:
18 Views,
4 PDF Downloads
Spoken language identification is the process of confirming labels regarding the language of an audio slice regardless of various features such as length, ambiance, duration, topic or message, age, gender, region, emotions, etc. Language identification systems are of great significance in the domain of natural language processing, more specifically multi-lingual machine translation, language recognition, and automatic routing of voice calls to particular nodes speaking or knowing a particular language. In his paper, we are comparing results based on various cepstral and spectral feature techniques such as Mel-frequency Cepstral Coefficients (MFCC), Relative spectral-perceptual linear prediction coefficients (RASTA-PLP), and spectral features (roll-off, flatness, centroid, bandwidth, and contrast) in the process of spoken language identification using Recurrent Neural Network-Long Short Term Memory (RNN-LSTM) as a procedure of sequence learning. The system or model has been implemented in six different languages, which contain Ladakhi and the five official languages of Jammu and Kashmir (Union Territory). The dataset used in experimentation consists of TV audio recordings for Kashmiri, Urdu, Dogri, and Ladakhi languages. It also consists of standard corpora IIIT-H and VoxForge containing English and Hindi audio data. Pre-processing of the dataset is done by slicing different types of noise with the use of the Spectral Noise Gate (SNG) and then slicing into audio bursts of 5 seconds duration. The performance is evaluated using standard metrics like F1 score, recall, precision, and accuracy. The experimental results showed that using spectral features, MFCC and RASTA-PLP achieved an average accuracy of 76%, 83%, and 78%, respectively. Therefore, MFCC proved to be the most convenient feature to be exploited in language identification using a recurrent neural network long short-term memory classifier. |