MENGIDENTIFIKASI HOAX PADA HASIL PENCARIAN BERITA ONLINE DENGAN TEKNIK WEB SCRAPING DAN ALGORITMA C4.5
Abstract
Online news is a journalistic product reports the facts or events that are produced and distributed via internet. However, not all of the information through online media is a real facts, also described as hoax. The large number of hoax news occurs, of course, deliver the impact on the people who look on the news, so it could cause misperceptions or inappropriate actions. We exploit a web scraping technique to extract the content from search search engines results. Furthermore, we employ the C4.5 algorithm for the classification process. There were three parameters as references: invitation to spread the news, credibility of the sources, and provoking title. The results of this work were a decision tree, that able to classify a news content as a hoax or legitimate. From the experiments which carried out, the accuracy of classification using the web scraping and C4.5 algorithm achieved 80% of success rate in determining the hoax.
References
Z. Indra, N. Zamin, and J. Jaafar, “A clustering technique using single pass clustering algorithm for search engine,” in 2014 4th World Congress on Information and Communication Technologies, WICT 2014, 2014, pp. 182–187.
R. Mustika, “Etika Berkomunikasi Di Media Online Dalam Menangkal Hoax,” Diakom J. Media dan Komun., vol. 1, no. 2, pp. 43–50, 2018.
A. N. Desga, “Upaya Media Massa Online dalam Menghadapi Berita Hoax,” J. Kaji. Media, vol. 2, no. 2, pp. 97–101, 2018.
kominfo, “Kominfo Temukan 3.356 Hoaks, Terbanyak saat Pemilu 2019,” kominfo.go.id, 2019. [Online]. Available: https://kominfo.go.id/content/detail/21876/kominfo-temukan-3356-hoaks-terbanyak-saat-pemilu-2019/0/berita_satker. [Accessed: 14-Apr-2021].
A. Yuliani, “Ada 800.000 Situs Penyebar Hoax di Indonesia,” kominfo.go.id, 2017. [Online]. Available: https://kominfo.go.id/content/detail/12008/ada-800000-situs-penyebar-hoax-di-indonesia/0/sorotan_media. [Accessed: 14-Apr-2021].
A. Budiman, “Berita Bohong (Hoax) Di Media Sosial Dan Pembentukan Opini Publik,” Pusat Penelitian Badan Keahlian DPR RI, vol. IX, no. 01, pp. 2009–2012, 2017.
M. Iqbal, “Efektifitas Hukum dan Upaya Menangkal Hoax Sebagai Konsekuensi Negatif Perkembangan Interaksi Manusia,” Literasi Huk., vol. 3, no. 2, pp. 1–9, 2019.
Z. Indra, J. Jaafar, N. Zamin, and Z. A. Bakar, “A language identifier for Indonesian and Malay text document,” in 2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015 - Proceedings, 2016, vol. 2015, pp. 127–131.
J. Jaafar, Z. Indra, and N. Zamin, “A category classification algorithm for Indonesian and Malay news documents,” J. Teknol., vol. 78, no. 8–2, pp. 121–132, 2016.
Z. Indra and L. Trisnawati, “PENGEMBANGAN INTELLIGENT DATA COLLECTOR UNTUK ANALISIS BIG DATA ARTIKEL BERITA ONLINE,” RABIT J. Teknol. dan Sist. Inf. Univrab, vol. 3, no. 1, pp. 47–57, 2018.
S. Munzert, C. Rubba, P. Meißner, and D. Nyhuis, Automated data collection with R: A practical guide to web scraping and text mining. John Wiley & Sons, 2014.
A. V Saurkar and S. A. Gode, “An Overview On Web Scraping Techniques And Tools,” Int. J. Futur. Revolut. Comput. Sci. Commun. Eng., vol. 4, no. 4, pp. 363–367, 2018.
A. Cherfi, K. Nouira, and A. Ferchichi, “Very fast C4. 5 decision tree algorithm,” Appl. Artif. Intell., vol. 32, no. 2, pp. 119–137, 2018.
I. S. Damanik, A. P. Windarto, A. Wanto, Poningsih, S. R. Andani, and W. Saputra, “Decision Tree Optimization in C4.5 Algorithm Using Genetic Algorithm,” J. Phys. Conf. Ser., vol. 1255, no. 1, 2019.
dev0928, “Getting started with web scraping in Python,” 2020. [Online]. Available: https://dev.to/dev0928/getting-started-with-web-scraping-in-python-1joi. [Accessed: 14-Apr-2021].
C. A. Sugianto, “Analisis Komparasi Algoritma Klasifikasi Untuk Menangani Data Tidak Seimbang Pada Data Kebakaran Hutan,” Techno.Com, vol. 14, no. 4, pp. 336–342, 2015.
Sunaryono, “Penelitian Komparasi Algoritma Klasifikasi dalam Menentukan Website Palsu,” Teknikom, vol. 1, no. 1, pp. 1–12, 2017.
N. Frastian, S. Hendrian, and V. H. Valentino, “Komparasi Algoritma Klasifikasi Menentukan Kelulusan Mata Kuliah Pada Universitas,” Fakt. Exacta, vol. 11, no. 1, p. 66, 2018.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright Notice
The copyright of the received article shall be assigned to the publisher of the journal. The intended copyright includes the right to publish the article in various forms (including reprints). The journal maintains the publishing rights to published articles. Therefore, the author must submit a statement of the Copyright Transfer Agreement.*)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
In line with the license, authors and any users (readers and other researchers) are allowed to share and adapt the material only for non-commercial purposes. In addition, the material must be given appropriate credit, provided with a link to the license, and indicated if changes were made. If authors remix, transform or build upon the material, authors must distribute their contributions under the same license as the original.
Please find the rights and licenses in RABIT : Jurnal Teknologi dan Sistem Informasi Univrab. By submitting the article/manuscript of the article, the author(s) accept this policy.
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
2. Author’s Warranties
The author warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).
3. User Rights
RABIT's spirit is to disseminate articles published are as free as possible. Under the Creative Commons license, RABIT permits users to copy, distribute, display, and perform the work for non-commercial purposes only. Users will also need to attribute authors and RABIT on distributing works in the journal.
4. Rights of Authors
Authors retain all their rights to the published works, such as (but not limited to) the following rights;
- Copyright and other proprietary rights relating to the article, such as patent rights,
- The right to use the substance of the article in own future works, including lectures and books,
- The right to reproduce the article for own purposes,
- The right to self-archive the article,
- The right to enter into separate, additional contractual arrangements for the non-exclusive distribution of the article's published version (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal (RABIT : Jurnal Teknologi dan Sistem Informasi Univrab).
5. Co-Authorship
If the article was jointly prepared by other authors, any authors submitting the manuscript warrants that he/she has been authorized by all co-authors to be agreed on this copyright and license notice (agreement) on their behalf, and agrees to inform his/her co-authors of the terms of this policy. RABIT will not be held liable for anything that may arise due to the author(s) internal dispute. RABIT will only communicate with the corresponding author.
6. Royalties
This agreement entitles the author to no royalties or other fees. To such extent as legally permissible, the author waives his or her right to collect royalties relative to the article in respect of any use of the article by RABIT.
7. Miscellaneous
RABIT will publish the article (or have it published) in the journal if the article’s editorial process is successfully completed. RABIT's editors may modify the article to a style of punctuation, spelling, capitalization, referencing and usage that deems appropriate. The author acknowledges that the article may be published so that it will be publicly accessible and such access will be free of charge for the readers as mentioned in point 3.