Author: Ahmed Abdelrahim Ali Eldouh/ Title: Initial data reorderering in mapreduce technique for specific data categories /

Search In this Thesis

العنوان

Initial data reorderering in mapreduce technique for specific data categories /

الناشر

Ahmed Abdelrahim Ali Eldouh ,

المؤلف

Ahmed Abdelrahim Ali Eldouh

هيئة الاعداد

باحث / Ahmed Abdelrahim Ali Eldouh

مشرف / Hatem Elkadi

مشرف / Mohamed Helmy Khafagy

مشرف / Hatem Elkadi

تاريخ النشر

2018

عدد الصفحات

87 Leaves :

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

Information Systems

تاريخ الإجازة

26/5/2019

مكان الإجازة

جامعة القاهرة - كلية الحاسبات و المعلومات - Information System

الفهرس

Only 14 pages are availabe for public view

from

Abstract

The rapid increase in big data sets presents an urgent need for handling the difficulty in storing and processing of these datasets. MapReduce is a recent programming model which was initiated by Google{u2019}s Team to handle big data sets and storing. Hadoop is an open source software with an implementation of MapReduce presented by Apache. MapReduce requires a shuffling phase to exchange global the intermediate data generated by the mapping phase, but the shuffling phase in MapReduce increases the overhead on performance. In this thesis, we explore the literature on the shuffling subject and discuss previous techniques adopted to enhance the performance of MapReduce. In addition to our focus on an approach to improve the performance of MapReduce through reducing the overhead caused by shuffling phase. Improving the locality of data will lead to eliminating the network overhead in the shuffling phase for the MapReduce. We achieve this by pre-partitioning data based on query-based similarity through the TF {u2013} IDF and Cosine similarity algorithms and grouping the related queries with each other using K-means clustering algorithm. In this regard, we support HDFS with the related data and control where data are stored to collocate the related data files in the same nodes