Author: Khalil,Heba Mohammed./ Title: Authorship Forensic Analysis Based on<br>Linguistic Knowledge and Learning<br>Techniques.

Search In this Thesis

العنوان

Authorship Forensic Analysis Based on
Linguistic Knowledge and Learning
Techniques.

المؤلف

Khalil,Heba Mohammed.

هيئة الاعداد

باحث / Heba Mohammed Khalil

مشرف / Tarek Ahmed El-Shishtawy

مشرف / Ahmed Taha

مناقش / Yasser fouad mahmoud

مناقش / Abdelwahab kamel alsammak

الموضوع

forensic authorship authentication . Stylometric features. Machine Learning .

تاريخ النشر

2021.

عدد الصفحات

99 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

Computer Science Applications

تاريخ الإجازة

1/7/2021

مكان الإجازة

جامعة بنها - كلية الحاسبات والمعلومات - علوم الحاسب

الفهرس

Only 14 pages are availabe for public view

from

117

from

117

Abstract

Nowadays, forensic authorship authentication plays a vital role in
identifying the numbers of unknown authors raised with fast-increasing
internet usage worldwide. Authorship authentication is the process in
which a linguist attempts to identify the author of an anonymous text
based on the vocabulary used and the linguistic style of the writer. The
most existing studies of authorship forensic analysis focus on the English
language, while researches concerning the Arabic language is rare. In this
research, we present a new methodology that enhances authorship
forensic analysis focusing on the Arabic language.
This thesis presents two-level learning classifiers for authorship
authentication. The learning system is supplied with linguistic
knowledge, statistical, and vocabulary features to enhance its efficiency
instead of relying only on one type of features. The linguistic knowledge
is represented through lexical analysis features of the unknown text and
previous texts of authors.
The basic idea of the first classifier is to extract the unique vocabulary
terms identifying the author and used for recognition of unknown authors.
In the current work, a modified Term Frequency- Inverse document
Frequency (modified TF-IDF) is proposed, which is a modification of the
traditional TF-IDF method. Our approach is tested with large dataset
belongs to different political groups. The performance of the first
classifier for authorship forensic method is based only on vocabulary
words used by political group. The experimental results show that the
average accuracy for recognizing groups has increased from 89.33 %
when using the traditional TF-IDF, to 92% with the proposed modified
II
TF-IDF method. Further improvement is achieved when representing the
vocabulary terms in its Arabic lemma form, rather than its root form. The
results show that the accuracy is improved from 89.33 % to 92%. This
approach is tested with another Arabic articles dataset and achieves an
accuracy of 92% based on vocabulary words.
To get the best predictive performance for identifying authorship, the
first classifier is based on vocabulary features that detect the weight of
frequently results are fed to a second machine learning classifier. The
learning technique depends on statistical, linguistic features as well as the
vocabulary knowledge of the first classifier. All sets of features describe
the author’s writing styles in numerical forms.
The proposed two level classifier for identifying authorship shows
better performance. The experiments carried out show that the trained
two-level classifier improves the accuracy range from 94% to 96.16%.