Naive Bayes Classifier Optimization on Sentiment Analysis of Hotel Reviews

Siti Khomsah

Abstract


Feature extraction plays an important role in the sentiment analysis process, especially of text data. The Naive Bayes Classifier performs well on low feature dimensions. However, the accuracy provided is not optimal. To acquire  optimal machine learning model,  information gain method, evolutionary algorithm, and swarm intelligent algorithm are applied. The objective of this study is to determine the performance of the Particle Swarm Optimization (PSO) to optimize the Naive Bayes Classifier. Vectorization of words is carried out using TF-IDF. In order to produce high PSO performance, the PSO-NBC model is tested with several parameters, namely the number of particles (k = 3), setting of the number of iterations and inertia weight, individual intelligence coefficient (c1 = 1), and social intelligence coefficient (c2 = 2). Inert weight is calculated using the formulation (w = 0.5+ Rand ([- 1,1])). In conclusion, PSO is able to solve the problem space of text-based sentiment analysis. PSO is able to optimize the accuracy of Naive Bayes at a value of 89% to 91.76%. PSO performance is determined by the parameters used, especially the number of particles, the number of iterations, and the weight of inertia. A large number of particles accompanied by an increase in inertia weight can increase accuracy. The number of particles 20-30 has reached the optimal accuracy.


Keywords


Sentiment Analysis;Optimization; Features-selection; Naive Bayes Classifier; Particle Swarm Optimization

Full Text:

PDF

References


Cahyana, N., Khomsah, S., & Aribowo, A. S. (2019). Improving Imbalanced Dataset Classification Using Oversampling and Gradient Boosting. Proceeding - 2019 5th International Conference on Science in Information Technology: Embracing Industry 4.0: Towards Innovation in Cyber Physical System, ICSITech 2019, 217–222. https://doi.org/10.1109/ICSITech46713.2019.8987499

Eberhart, R., & Kennedy, J. (1995). New optimizer using particle swarm theory. Proceedings of the International Symposium on Micro Machine and Human Science, 39–43. https://doi.org/10.1109/mhs.1995.494215

Elin Hanjani Pramitha. (2020). Sentiment Analysis Komentar Pelanggan Hotel Di Purwokerto Menggunakan Naive Bayes Classifier.

Feng, G., Guo, J., Jing, B. Y., & Sun, T. (2015). Feature subset selection using naive Bayes for text classification. Pattern Recognition Letters, 65, 109–115. https://doi.org/10.1016/j.patrec.2015.07.028

Hu, X., Eberhart, R. C., & Shi, Y. (2003). Engineering optimization with particle swarm. 2003 IEEE Swarm Intelligence Symposium, SIS 2003 - Proceedings, 53–57. https://doi.org/10.1109/SIS.2003.1202247

Khomsah, S., & Aribowo, A. S. (2020). Model Text-Preprocessing Komentar Youtube Dalam Bahasa Indonesia. Rekayasa Sistem Dan Teknologi Informasi, RESTI, 4(10), 648–654. https://doi.org/https://doi.org/10.29207/resti.v4i4.2035

Naive Bayes. (n.d.). https://scikit-learn.org/stable/modules/naive_bayes.html

Osman, S. E., & Zarog, M. (2019). Optimized V-Shaped Beam Micro-Electrothermal Actuator Using Particle Swarm Optimization (PSO) Technique. Micro and Nanosystems, 11(1), 62–67. https://doi.org/10.2174/1876402911666190208162346

Pandhu Wijaya, A., & Agus Santoso, H. (2018). Improving the Accuracy of Naïve Bayes Algorithm for Hoax Classification Using Particle Swarm Optimization. Proceedings - 2018 International Seminar on Application for Technology of Information and Communication: Creative Technology for Human Life, ISemantic 2018, 482–487. https://doi.org/10.1109/ISEMANTIC.2018.8549700

Pramono, F., Didi Rosiyadi, & Windu Gata. (2019). Integrasi N-gram, Information Gain, Particle Swarm Optimation di Naïve Bayes untuk Optimasi Sentimen Google Classroom. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 3(3), 383–388. https://doi.org/10.29207/resti.v3i3.1119

Rasjid, Z. E., & Setiawan, R. (2017). Performance Comparison and Optimization of Text Document Classification using k-NN and Naïve Bayes Classification Techniques. Procedia Computer Science, 116, 107–112. https://doi.org/10.1016/j.procs.2017.10.017

Rizaldy, A., & Santoso, H. A. (2017). Performance improvement of support vector machine (SVM) With information gain on categorization of Indonesian news documents. Proceedings - 2017 International Seminar on Application for Technology of Information and Communication: Empowering Technology for a Better Human Life, ISemantic 2017, 2018-January, 227–231. https://doi.org/10.1109/ISEMANTIC.2017.8251874

Salton, G., & Buckley, C. (1988). Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management, 24(5), 513–523. https://doi.org/https://doi.org/10.1016/0306-4573(88)90021-0

Suyanto;, Arifianto, A., Rismala, R., & Sunyoto, A. (2020). Evolutionary Machine Learning (Edisi 1). Informatika.

Wardhani, N. K., Rezkiani, Kurniawan, S., Setiawan, H., Gata, G., Tohari, S., Gata, W., & Wahyudi, M. (2018). Sentiment analysis article news coordinator minister of maritime affairs using algorithm naive bayes and support vector machine with particle swarm optimization. Journal of Theoretical and Applied Information Technology, 96(24), 8365–8378.

Xiang, Z., Schwartz, Z., Gerdes, J. H., & Uysal, M. (2015). What can big data and text analytics tell us about hotel guest experience and satisfaction? International Journal of Hospitality Management, 44, 120–130. https://doi.org/10.1016/j.ijhm.2014.10.013

Xie, K., & Zhang, J. (2014). The Business Value of Online Consumer Reviews and Management Response to Hotel Performance. International Journal of Hospitality Management, 43(October 2017), 1–12. https://doi.org/10.1016/j.ijhm.2014.07.007

Yan, Y., Zhang, R., Wang, J., & Li, J. (2018). Modified PSO algorithms with “Request and Reset” for leak source localization using multiple robots. Neurocomputing, 292, 82–90. https://doi.org/10.1016/j.neucom.2018.02.078




DOI: http://dx.doi.org/10.17933/jppi.2020.100206

Copyright (c) 2020 Jurnal Penelitian Pos dan Informatika

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Jurnal Penelitian Pos & Informatika

ISSN 2088-9402 (print)| 2476-9266 (online)
Badan Litbang SDM Kemenkominfo
Puslitbang Sumber Daya, Perangkat, dan Penyelenggaraan Pos dan Informatika
Medan Merdeka Barat No. 9, Building B Floor 4, Ministry of Communication and Information Technology. Phone: +62 21 34833640