• Diki Arisandi Program Studi Teknik Informatika, Fakultas Teknik, Universitas Abdurrab
  • Zul Indra Program Studi Teknik Informatika, Fakultas Teknik, Universitas Abdurrab
  • Kartini Kartini Program Studi Teknik Informatika, Fakultas Teknik, Universitas Abdurrab


Online news is a journalistic product reports the facts or events that are produced and distributed via internet. However, not all of the information through online media is a real facts, also described as hoax. The large number of hoax news occurs, of course, deliver the impact on the people who look on the news, so it could cause misperceptions or inappropriate actions. We exploit a web scraping technique to extract the content from search search engines results. Furthermore, we employ the C4.5 algorithm for the classification process. There were three parameters as references: invitation to spread the news, credibility of the sources, and provoking title. The results of this work were a decision tree, that able to classify a news content as a hoax or legitimate. From the experiments which carried out, the accuracy of classification using the web scraping and C4.5 algorithm achieved 80% of success rate in determining the hoax.

Keywords: online news, hoax, web scraping, C4.5 algorithm, decision tree


Z. Indra, N. Zamin, and J. Jaafar, “A clustering technique using single pass clustering algorithm for search engine,” in 2014 4th World Congress on Information and Communication Technologies, WICT 2014, 2014, pp. 182–187.

R. Mustika, “Etika Berkomunikasi Di Media Online Dalam Menangkal Hoax,” Diakom J. Media dan Komun., vol. 1, no. 2, pp. 43–50, 2018.

A. N. Desga, “Upaya Media Massa Online dalam Menghadapi Berita Hoax,” J. Kaji. Media, vol. 2, no. 2, pp. 97–101, 2018.

kominfo, “Kominfo Temukan 3.356 Hoaks, Terbanyak saat Pemilu 2019,”, 2019. [Online]. Available: [Accessed: 14-Apr-2021].

A. Yuliani, “Ada 800.000 Situs Penyebar Hoax di Indonesia,”, 2017. [Online]. Available: [Accessed: 14-Apr-2021].

A. Budiman, “Berita Bohong (Hoax) Di Media Sosial Dan Pembentukan Opini Publik,” Pusat Penelitian Badan Keahlian DPR RI, vol. IX, no. 01, pp. 2009–2012, 2017.

M. Iqbal, “Efektifitas Hukum dan Upaya Menangkal Hoax Sebagai Konsekuensi Negatif Perkembangan Interaksi Manusia,” Literasi Huk., vol. 3, no. 2, pp. 1–9, 2019.

Z. Indra, J. Jaafar, N. Zamin, and Z. A. Bakar, “A language identifier for Indonesian and Malay text document,” in 2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015 - Proceedings, 2016, vol. 2015, pp. 127–131.

J. Jaafar, Z. Indra, and N. Zamin, “A category classification algorithm for Indonesian and Malay news documents,” J. Teknol., vol. 78, no. 8–2, pp. 121–132, 2016.

Z. Indra and L. Trisnawati, “PENGEMBANGAN INTELLIGENT DATA COLLECTOR UNTUK ANALISIS BIG DATA ARTIKEL BERITA ONLINE,” RABIT J. Teknol. dan Sist. Inf. Univrab, vol. 3, no. 1, pp. 47–57, 2018.

S. Munzert, C. Rubba, P. Meißner, and D. Nyhuis, Automated data collection with R: A practical guide to web scraping and text mining. John Wiley & Sons, 2014.

A. V Saurkar and S. A. Gode, “An Overview On Web Scraping Techniques And Tools,” Int. J. Futur. Revolut. Comput. Sci. Commun. Eng., vol. 4, no. 4, pp. 363–367, 2018.

A. Cherfi, K. Nouira, and A. Ferchichi, “Very fast C4. 5 decision tree algorithm,” Appl. Artif. Intell., vol. 32, no. 2, pp. 119–137, 2018.

I. S. Damanik, A. P. Windarto, A. Wanto, Poningsih, S. R. Andani, and W. Saputra, “Decision Tree Optimization in C4.5 Algorithm Using Genetic Algorithm,” J. Phys. Conf. Ser., vol. 1255, no. 1, 2019.

dev0928, “Getting started with web scraping in Python,” 2020. [Online]. Available: [Accessed: 14-Apr-2021].

C. A. Sugianto, “Analisis Komparasi Algoritma Klasifikasi Untuk Menangani Data Tidak Seimbang Pada Data Kebakaran Hutan,” Techno.Com, vol. 14, no. 4, pp. 336–342, 2015.

Sunaryono, “Penelitian Komparasi Algoritma Klasifikasi dalam Menentukan Website Palsu,” Teknikom, vol. 1, no. 1, pp. 1–12, 2017.

N. Frastian, S. Hendrian, and V. H. Valentino, “Komparasi Algoritma Klasifikasi Menentukan Kelulusan Mata Kuliah Pada Universitas,” Fakt. Exacta, vol. 11, no. 1, p. 66, 2018.

How to Cite
D. Arisandi, Z. Indra, and K. Kartini, “MENGIDENTIFIKASI HOAX PADA HASIL PENCARIAN BERITA ONLINE DENGAN TEKNIK WEB SCRAPING DAN ALGORITMA C4.5”, rabit, vol. 6, no. 2, pp. 130-137, Jul. 2021.
PDF (Bahasa Indonesia)
Abstract views: 710
downloads: 712