Analysis of machine learning techniques used in behavior-based malware detection
The increasing of malware that are exploiting the Internet daily has become a serious threat. The manual heuristic inspection of malware analysis is no longer considered effective and efficient compared against the high spreading rate of malware. Conventional signature matching-based antivirus systems fail to detect polymorphic, obfuscated, and new, previously unseen malicious executables. Hence, automated behavior-based malware detection using machine learning techniques is considered a profound solution. The behavior of each malware on an emulated (sandbox) environment will be automatically analyzed and will generate behavior reports. These reports will be preprocessed into sparse vector models for further machine learning (classification). The classifiers used in this research are k-Nearest Neighbors (kNN), Na├¤ve Bayes, Decision Tree, Support Vector Machine (SVM), and Artificial Neural Network (ANN). According to the analysis of the test and experiment results of all the 5 classifiers, the overall best performance goes to J48 with a recall (true positive rate) of 95.9%, a false positive rate of 2.4%, a precision (positive predictive value) of 97.3%, and an accuracy of 96.8% . In summary, it can be concluded that a proof- of-concept based on automatic behavior-based malware analysis and the use of machine learning techniques could detect malware quite effectively and efficiently.
No other version available