Author: Gomaa, Mai Abdrabo Abdel-Samie./ Title: An efficient knowledge discovery techniques for biological big data /

Search In this Thesis

العنوان

An efficient knowledge discovery techniques for biological big data /

المؤلف

Gomaa, Mai Abdrabo Abdel-Samie.

هيئة الاعداد

باحث / مى عبدربه عبدالسميع جمعه

مشرف / شريف ابراهيم بركات

مشرف / غاده سامى الطويل

مشرف / محمد محفوظ الموجى

مناقش / محمد عبدالفتاح بلال

مناقش / أحمد السعيد طلبة

الموضوع

Data mining. Database management. Medical informatics. Database searching. Medicine - Information technology.

تاريخ النشر

2018.

عدد الصفحات

128 p. :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Information Systems

تاريخ الإجازة

1/12/2018

مكان الإجازة

جامعة المنصورة - كلية الحاسبات والمعلومات - Information Systems

الفهرس

Only 14 pages are availabe for public view

from

128

from

128

Abstract

This thesis presents a study of the concept of Big data in an attempt to deal with it by understanding, analyzing and limiting its challenges. The increasing of data and diversity of its forms is a challenge to deal with Big data. So two frameworks are proposed to improve classification performance. In data preprocessing stage, data is transformed to put all features in numerical format. Genetic Rough set is used to reduce and select features. In data processing stage, removing mislabeled instances helps to reach to accurate results. It is necessary to remove mislabeled instances by learning algorithms for increasing classification accuracy. The first framework is proposed for eliminating mislabeled instances to improve classification performance. We tried to reduce and remove instances that cause misclassification. We use Fuzzy-Rough Nearest Neighbor to remove mislabeled instances. After that classification techniques are implemented. In the second framework, MapReduce and Fuzzy Rough set are used for feature selection. A proposed framework has three main stages, which are data preprocessing, map, and reduce stage. In data preprocessing stage, we tried to overcome two main problems. The first problem is variety of data, one to one transformation is proposed to face the problem of heterogeneous data. The second problem is incomplete data which is solved by k-nearest neighbor imputation. In map stage, we apply rough set concepts for feature selection. In reducing stage, we applied clustering for identifying similar features to assign the same key. Our framework aim is to reduce features of big data sets. Using FuzzyRough for feature selection saves time to build classification model according to results. The best related work related in around 70-88% classification accuracy. The result of the proposed decision tree are 87.2% accuracy, 90.3 % precision. The accuracy of the KNN reached 90.9%.