الفهرس | Only 14 pages are availabe for public view |
Abstract The outburst growth of technology in the academic environment and the widespread use of digital libraries have generated big scholarly data. Ranking and measuring the impact of academic papers grants higher importance to the academic environment that is required for promotions, hiring, awards, grants, scholarships, and ranking university procedures. Google Scholar ranking depends mainly on the citation count of academic papers; therefore, some papers are ranked low even if they are qualified papers. Identifying the most important articles in the field is considered a critical issue for researchers, journals, and academic institutions. The goal of this study is to create a ranking system for big scholarly data (RBSD) that integrates network analysis based on graph analytics, citation analysis, and similarity between papers. The proposed model ranks papers based on the paper citation network to get the central papers. It also ranks authors to identify the top authors in the computer science citation network and analyzes the similarity between academic papers to get the relevancy between papers. A new methodology is proposed to rank papers based on a weighted score that considers paper information, author information, and publication venue information. The proposed model also considers the complex relationship between papers, overcoming the limitations of other ranking systems that rely only on the traditional PageRank algorithm. To produce a more accurate ranking system, the proposed model excludes authors’ self-citation and collaboration citations, which are often used by authors to increase their citation count. The RBSD model was implemented using four real-world datasets: ACM, MAG, DBLP, and Scopus Elsevier, for publication venue information. The proposed model was applied to 2,092,356 papers, with 8,024,869 citations. This was implemented using Apache Spark Graphx to accelerate the execution time for graph analysis and to explore the nature of scholarly data. The experimental result vi evaluation uses statistical measures to determine the quality of ranking systems. The results show that the proposed model outperformed the Google Scholar Ranking procedure, and FPRT model with 0.82 for MRR values and returns reasonable results. Keywords: Scholarly Data, Big Data, Graph Theory, Citation Analysis, Ranking Systems, Bibliographic Coupling, Co-Citations. |