Author: Hegazy، Nagwa Yaseen Nady Mohammed./ Title: A New Smart Ranking System Based On Big Scholarly Data Analytics /

Search In this Thesis

العنوان

A New Smart Ranking System Based On Big Scholarly Data Analytics /

المؤلف

Hegazy، Nagwa Yaseen Nady Mohammed.

هيئة الاعداد

باحث / نجوى يسـن نادى محمد حجازى

مشرف / محمد حلمي خفاجي

مشرف / ايمن السيد خضر

مناقش / محمد حلمي خفاجة

الموضوع

Qrmak

تاريخ النشر

2023

عدد الصفحات

151 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Information Systems

تاريخ الإجازة

11/1/2023

مكان الإجازة

جامعة الفيوم - كلية الحاسبات والمعلومات - نظم المعلومات

الفهرس

Only 14 pages are availabe for public view

from

151

from

151

Abstract

The outburst growth of technology in the academic environment and the
widespread use of digital libraries have generated big scholarly data. Ranking and
measuring the impact of academic papers grants higher importance to the
academic environment that is required for promotions, hiring, awards, grants,
scholarships, and ranking university procedures. Google Scholar ranking depends
mainly on the citation count of academic papers; therefore, some papers are
ranked low even if they are qualified papers. Identifying the most important
articles in the field is considered a critical issue for researchers, journals, and
academic institutions. The goal of this study is to create a ranking system for big
scholarly data (RBSD) that integrates network analysis based on graph analytics,
citation analysis, and similarity between papers. The proposed model ranks papers
based on the paper citation network to get the central papers. It also ranks authors
to identify the top authors in the computer science citation network and analyzes
the similarity between academic papers to get the relevancy between papers. A
new methodology is proposed to rank papers based on a weighted score that
considers paper information, author information, and publication venue
information. The proposed model also considers the complex relationship
between papers, overcoming the limitations of other ranking systems that rely
only on the traditional PageRank algorithm. To produce a more accurate ranking
system, the proposed model excludes authors’ self-citation and collaboration
citations, which are often used by authors to increase their citation count. The
RBSD model was implemented using four real-world datasets: ACM, MAG,
DBLP, and Scopus Elsevier, for publication venue information. The proposed
model was applied to 2,092,356 papers, with 8,024,869 citations. This was
implemented using Apache Spark Graphx to accelerate the execution time for
graph analysis and to explore the nature of scholarly data. The experimental result
vi
evaluation uses statistical measures to determine the quality of ranking systems.
The results show that the proposed model outperformed the Google Scholar
Ranking procedure, and FPRT model with 0.82 for MRR values and returns
reasonable results.
Keywords: Scholarly Data, Big Data, Graph Theory, Citation Analysis, Ranking
Systems, Bibliographic Coupling, Co-Citations.