Author: Ibrahim, Sarah Osama Talaat./ Title: Diseases Prediction Based on Gene selection from Microarray Gene Expression Using Artificial Intelligence /

Search In this Thesis

العنوان

Diseases Prediction Based on Gene selection from Microarray Gene Expression Using Artificial Intelligence /

المؤلف

Ibrahim, Sarah Osama Talaat.

هيئة الاعداد

باحث / سارة أسامه طلعت إبراهيم

مشرف / عبد المجيد أمين علي

مشرف / حسن شعبان حسن

الموضوع

Artificial intelligence.

تاريخ النشر

2023.

عدد الصفحات

224 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

علوم الحاسب الآلي

تاريخ الإجازة

12/9/2023

مكان الإجازة

جامعة المنيا - كلية الحاسبات والمعلومات - علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

260

from

260

Abstract

Microarray gene expression has emerged as a powerful tool for understanding complex biological processes, and its application to cancer research has transformed our understanding of cancer biology. One of the significant applications of microarray gene expression is cancer classification, where it has been used to identify specific molecular subtypes of cancer that have distinct clinical outcomes and responses to treatment. This has led to more personalized approaches to cancer therapy and improved patient outcomes. In addition, microarray gene expression analysis has facilitated the discovery of novel biomarkers for cancer diagnosis, prognosis, and therapeutic response prediction.
However, one of the main limitations of microarray gene expression data is the curse of dimensionality. The curse of dimensionality refers to the high number of features relative to the number of samples, which can lead to overfitting and poor generalization performance in predictive models. Two main approaches are commonly used to address this issue: feature extraction and feature selection.
Objectives:
Overall, this thesis aims to address the curse of dimensionality and to improve the accuracy and effectiveness of gene selection and cancer classification techniques in microarray gene expression data. The thesis objectives can be stated as follows.
• Present a comprehensive review of the state-of-the-art data preprocessing, feature (i.e., gene) selection, feature extraction, and their hybrid, and ML algorithms for cancer classification based on gene expression datasets. This is to analyze the performance of previously developed techniques, to present an outline of advantages and disadvantages of each gene reduction technique, and to define new research directions.
• Tackle the curse of dimensionality in microarray gene expression analysis by performing feature selection to identify a subset of informative genes relevant to the biological question while preserving the underlying data structure.
• Facilitate learning tasks from microarray data in order to detect cancer types.
• Satisfy more robust predictions by improving the classification accuracy.
• Evaluate the quality of the developed techniques using well-known metrics.
Methodology:
The thesis methodology targets the curse of dimensionality issue where the number of genes significantly exceeds the number of samples. To address this issue, it is possible to combine multiple swarm algorithms and modify or enhance existing ones. However, swarm algorithms are known to have several drawbacks according to literature, including slow convergence rates, a tendency to become trapped in local optima, the impact of algorithm parameters on performance, and an imbalance between exploration and exploitation phases.
Consequently, this work examines the relevant dimensionality reduction techniques found in the literature and recently developed in Chapter 2 to select the best feature reduction techniques that have potential for success in the gene reduction domain. Afterward, this work studies the recently developed swarm optimization algorithms and their characteristics to use some of them in implementing new hybrid-based gene selection techniques that tackle the gene selection issue. Mainly, SWO, MGO, and COA which are newly proposed swarm optimization algorithm. The SWO, MGO, and COA algorithms have several novel updating strategies. Thus, they can handle several optimization problems with many exploration and exploitation strategies.
Even though the SWO and MGO algorithms have attained favorable results, they are not entirely impervious to the limitations that swarm algorithms may experience. Therefore, this work improves the performance of SWO, MGO algorithms, and suggests multiple hybrid gene selection techniques. The selected gene reduction type and selected swarm algorithms are adapted and implemented. Then, they are validated using eight microarray gene expression datasets. These techniques have been discussed in details in Chapter 3, Chapter 4, and Chapter 5.
Results:
The experimental findings indicate that the proposed techniques can successfully identify the most informative genes and enhance classification accuracy. The proposed techniques achieve state-of-the-art performance compared to existing techniques. Moreover, RSWO MPA, MUL-MGO, MUL-RMGO, and IG-COA techniques outperform the most basic methods, other swarm algorithms, and cutting-edge techniques in the literature on identical datasets regarding the accuracy and the number of selected genes. Overall, this thesis contributes towards improving the accuracy and stability of cancer classification models using gene expression data.
Recommendation:
• Evaluate the proposed methods on multiple dataset modalities, including DNA and RNA.
• Use augmentation techniques to improve cancer classification based on microarray gene expression.
• Develop multimodal fusion models. Multimodal fusion refers to the integration of information from multiple sources or modes to improve prediction or classification performance. In the context of this thesis, multimodal fusion can involve combining gene expression data with other types of omics data, such as DNA methylation or protein expression.
• Develop ML and deep learning models that consider the uncertainty.