Content-based approach Short Message Service (SMS) spam filtering for Bahasa Indonesia text messages
The massive distributions of sms spam gives nuisance to the subscribers and also potential loss to the service providers. In indonesia, sms spam is also increasing due to the cost, which is getting cheap. Various solutions to coMBAt sms spam have been proposed, however the corpus data are mostly in english. This study proposes feature selection and classification method for bahasa indonesia sms spam filtering with content-based approach. The feature selection technique used is backward elimination with recursive feature elimination support vector machine (rfe-svm). Meanwhile the result of the feature selection is evaluated with na├¤ve bayes and support vector machine (svm) with binary vector representation achieves the best result among other vector representations. Na├¤ve bayes still performs the best compare to svm with accuracy rate reaching 96%. Moreover, this study also proof that the dictionary size can be reduced up to more than 50% without significantly reduces the accuracy rate.
M00258 | (wh) | Available |
No other version available