Search In this Thesis
   Search In this Thesis  
العنوان
Performance study and a clustering algorithm for antivirus similarity analysis =
المؤلف
SaadAldeen, Donia Naief,
هيئة الاعداد
مشرف / Donia Naief SaadAldeen
مشرف / Ezzat AbdelTawab Korany
مشرف / Sahar Mohamed Said Ghanem
مشرف / Shawkat Kamal Guirguis
مشرف / Wafaa Ahmad Kamal Elhaweet
الموضوع
antivirus.
تاريخ النشر
2021.
عدد الصفحات
57 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Information Systems
تاريخ الإجازة
5/1/2021
مكان الإجازة
جامعة الاسكندريه - معهد الدراسات العليا والبحوث - Department of Information Technology
الفهرس
Only 14 pages are availabe for public view

from 67

from 67

Abstract

The most common goal of malware analysis is to determine if a given binary is malware or benign. Another objective is similarity analysis of malware binaries to understand how new samples differ from known ones. Similarity analysis helps to analyze the malware with respect to those already analyzed and guides the discovery of novel aspects that should be analyzed more in depth.
In this work, we are concerned with similarities and differences detection of malware binaries. Thousands of malware are created every day and machine learning is an indispensable tool for its analysis. ,
Previous work has studied clustering and classification as competing paradigms. However, in this work, a malware similarity analysis technique (AltCC) is proposed that alternates the use of clustering and classification. In addition it assumes the malware are not available all at once but processed in batches. Initially, clustering is applied to the first batch to group similar binaries into novel malware classes. Then, the discovered classes are used to train a classifier. For the following batches, the classifier is used to decide if a new binary classifies to a known class or otherwise unclassified. The unclassified binaries are clustered and the process repeats. Malware clustering (i.e. labeling) may entail further human expert analysis but dramatically reduces the effort.
The effectiveness of AltCC is studied using a dataset of 29,661 malware binaries that represent malware received in six consecutive days/batches. When KMeans is used to label the dataset all at once and its labeling is compared to AltCC’s, the adjusted- rand-index scores 0.71.