OPTIMASI PREDIKSI CLICK THROUGH RATE PADA DATA IMBALANCE MENGGUNAKAN AUGMENTASI CTGAN DAN ALGORITMA GRADIENT BOOSTING

  • Ichtiar Akbar Sakti
  • 14220032

ABSTRAK

ABSTRAK

 

Masalah ketidakseimbangan kelas (Class Imbalance) merupakan tantangan utama dalam prediksi Click Through Rate (CTR), di mana jumlah data pengguna yang mengklik iklan jauh lebih sedikit dibandingkan dengan yang tidak mengklik. Ketidakseimbangan ini menyebabkan model pembelajaran mesin menunjukkan bias terhadap kelas mayoritas, sehingga menghasilkan nilai Recall yang rendah dalam deteksi klik. Penelitian ini mengusulkan pemanfaatan Conditional Tabular Generative Adversarial Network (CTGAN) sebagai metode augmentasi data untuk menyeimbangkan distribusi dataset dalam model prediktif berbasis XGBoost. Dataset yang digunakan mencakup 10.000 catatan perilaku pengguna dengan proporsi kelas minoritas sebesar 14,8%. Proses penelitian terdiri dari pra-pemrosesan data, ekstraksi fitur temporal, pelatihan CTGAN selama 500 epoch, dan evaluasi melalui perbandingan antara data asli (baseline) dan data hasil augmentasi. Hasil eksperimen menunjukkan bahwa penerapan CTGAN secara substansial meningkatkan kinerja model. Nilai Recall meningkat signifikan dari 0,43 pada model dasar menjadi 0,88 setelah augmentasi, sedangkan F1-Score bertambah dari 0,52 menjadi 0,88. Selain itu, nilai AUC mencapai 0,94, yang menunjukkan kemampuan model diskriminasi yang sangat baik. Penelitian ini menunjukkan bahwa CTGAN efektif dalam menangkap distribusi data tabular yang kompleks dan meningkatkan sensitivitas model XGBoost dalam mendeteksi potensi klik iklan tanpa menyebabkan overfitting.

Kata Kunci: Click Through Rate, Class Imbalance, CTGAN, XGBoost, Augmentasi Data

KATA KUNCI

Optimasi Prediksi,AUGMENTASI CTGAN,Algoritma Gradient Bossting


DAFTAR PUSTAKA

DAFTAR PUSTAKA

 

[1] Y. Yang and P. Zhai, “Click-through rate prediction in online advertising: A literature review,” Inf. Process. Manag., vol. 59, no. 2, p. 102853, Mar. 2022, doi: 10.1016/j.ipm.2021.102853.

[2] I. A. Sutarini, “Evolusi Industri Periklanan di Era Disrupsi,” 2019.

[3] D. Dumitriu and M. A.-M. Popescu, “Artificial Intelligence Solutions for Digital Marketing,” Procedia Manuf., vol. 46, pp. 630–636, 2020, doi: 10.1016/j.promfg.2020.03.090.

[4] D. Mecca, “Global Ad Spending to Hit $1 Trillion in 2025 – What It Means for Businesses, B2B Marketers, and Nonprofits.” Accessed: Oct. 12, 2025. [Online]. Available: https://abbeymecca.com/worldwide-ad-spending-to-hit-1-trillion-in-2025-what-it-means-forbusinesses-marketers-andnonprofits/#:~:text=In%202025%2C%20digital%20advertising%20will,is%20outpacing%2 0all%20traditional%20formats.

[5] N. Sahllal and E. M. Souidi, “A Comparative Analysis of Sampling Techniques for Click-Through Rate Prediction in Native Advertising,” IEEE Access, vol. 11, pp. 24511– 24526, 2023, doi: 10.1109/ACCESS.2023.3255983.

[6] C. Kaope and Y. Pristyanto, “The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance,” MATRIK J. Manaj. Tek. Inform. Dan Rekayasa Komput., vol. 22, no. 2, pp. 227–238, Mar. 2023, doi: 10.30812/matrik.v22i2.2515.

[7] X. Xi, S. Leng, Y. Gong, and D. Li, “An accuracy improving method for advertising click through rate prediction based on enhanced xDeepFM model,” Nov. 21, 2024, arXiv: arXiv:2411.15223. doi: 10.48550/arXiv.2411.15223.

[8] X. Xiong, C. Xie, R. Zhao, Y. Li, S. Ju, and M. Jin, “A Clickthrough Rate Prediction Algorithm Based on Users’ Behaviors,” IEEE Access, vol. 7, pp. 174782–174792, 2019, doi: 10.1109/ACCESS.2019.2957054.

[9] A. D’souza, “Synthetic Tabular Data Generation for Imbalanced Classification: The Surprising Effectiveness of an Overlap Class,” 2025.

[10] G. Eom and H. Byeon, “Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique,” Mathematics, vol. 11, no. 16, p. 3605, Aug. 2023, doi: 10.3390/math11163605.

[11] W. Wang et al., “A User Purchase Behavior Prediction Method Based on XGBoost,” Electronics, vol. 12, no. 9, p. 2047, Apr. 2023, doi: 10.3390/electronics12092047.

[12] M. Li and Y. Wang, “Power load forecasting and interpretable models based on GS_XGBoost and SHAP,” J. Phys. Conf. Ser., vol. 2195, no. 1, p. 012028, Feb. 2022, doi: 10.1088/1742-6596/2195/1/012028.

[13] J. Gradstein, M. Salhov, Y. Tulpan, O. Lindenbaum, and A. Averbuch, “Imbalanced Classification via a Tabular Translation GAN,” Apr. 19, 2022, arXiv: arXiv:2204.08683. doi: 10.48550/arXiv.2204.08683. [14] L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni, “Modeling Tabular data using Conditional GAN,” 2019.

[15] L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni, “Modeling Tabular data using Conditional GAN,” 2019.

[16] R. Thinakaran, R. K. Pandey, P. K. Srivastava, J. Jyotsna, and S. Madhavedi, “Generative adversarial networks for synthetic data generation: A systematic review of techniques, applications, and evaluation methods,” Int. J. Innov. Res. Sci. Stud., vol. 8, no. 5, pp. 286–293, Jul. 2025, doi: 10.53894/ijirss.v8i5.8655.

[17] S. Ho, Y. Qu, B. Gu, L. Gao, J. Li, and Y. Xiang, “DP-GAN: Differentially private consecutive data publishing using generative adversarial nets,” J. Netw. Comput. Appl., vol. 185, p. 103066, Jul. 2021, doi: 10.1016/j.jnca.2021.103066.

[18] A. A. Barr, R. Rozman, and E. Guo, “Generative adversarial networks vs large language models: a comparative study on synthetic tabular data generation,” Feb. 20, 2025, arXiv: arXiv:2502.14523. doi: 10.48550/arXiv.2502.14523.

[19] C. Thay, “Click-Through Rate Prediction Dataset.” Jan. 20, 2025. [Online]. Available: https://www.kaggle.com/datasets/gauravduttakiit/clickthrough-rate-prediction

[20] M. Y. Saeed, I. Kousar, and M. Awais, “Predictive Models for Advertisement Campaign Budget Allocation,” 2025.

[21] A. Uruqi and I. Viktoratos, “Exploiting Spiking Neural Networks for Click-Through Rate Prediction in Personalized Online Advertising Systems,” Forecasting, vol. 7, no. 3, p. 38, Jul. 2025, doi: 10.3390/forecast7030038.

[22] S. Li, Z. Cui, and Y. Pei, “A Dual Adaptive Interaction Click-Through Rate Prediction Based on Attention Logarithmic Interaction Network,” Entropy, vol. 24, no. 12, p. 1831, Dec. 2022, doi: 10.3390/e24121831.

[23] J. Lou, “Comparative Analysis of Logistic Regression, Random Forest, and XGBoost for Click-Through Rate Prediction in Digital Advertising,” in Proceedings of the 2024 2nd International Conference on Management Innovation and Economy Development (MIED 2024), vol. 300, B. Siuta-Tokarska, A. Grigorescu, and Y. Zhu, Eds., in Advances in Economics, Business and Management Research, vol. 300. , Dordrecht: Atlantis Press International BV, 2024, pp. 462–470. doi: 10.2991/978-94-6463-542-3_54

[24] S. Y. D. C. Attota, “Optimizing Fintech Marketing: A Comparative Study of Logistic Regression and XGBoost,” Dec. 20, 2024, arXiv: arXiv:2412.16333. doi: 10.48550/arXiv.2412.16333.

[25] N. M. AbdelAziz, M. Bekheet, A. Salah, N. El-Saber, and W. T. AbdelMoneim, “A Comprehensive Evaluation of Machine Learning and Deep Learning Models for Churn Prediction,” Information, vol. 16, no. 7, p. 537, Jun. 2025, doi: 10.3390/info16070537.

[26] L. Grinsztajn, E. Oyallon, and G. Varoquaux, “Why do tree-based models still outperform deep learning on typical tabular data?”.

[27] R. Shwartz-Ziv and A. Armon, “Tabular Data: Deep Learning is Not All You Need,” Nov. 23, 2021, arXiv: arXiv:2106.03253. doi: 10.48550/arXiv.2106.03253.

[28] J. Pasaribu, N. Yudistira, and W. F. Mahmudy, “Tabular Data Classification and Regression?: XGBoost or Deep Learning with Retrieval-Augmented Generation,” IEEE Access, pp. 1–1, 2024, doi: 10.1109/ACCESS.2024.3518205.

[29] Z. Jinbo, L. Yufu, and M. Haitao, “Handling missing data of using the XGBoostbased multiple imputation by chained equations regression method,” Front. Artif. Intell., vol. 8, p. 1553220, Apr. 2025, doi: 10.3389/frai.2025.1553220.

Detail Informasi

Tesis ini ditulis oleh :

  • Nama : Ichtiar Akbar Sakti
  • NIM : 14220032
  • Prodi : Ilmu Komputer
  • Kampus : Margonda
  • Tahun : 2025
  • Periode : II
  • Pembimbing : Prof. Dr. Agus Subekti, MT
  • Asisten :
  • Kode : 0022.S2.IK.TESIS.II.2025
  • Diinput oleh : RKY
  • Terakhir update : 28 April 2026
  • Dilihat : 21 kali

TENTANG PERPUSTAKAAN


PERPUSTAKAAN UNIVERSITAS NUSA MANDIRI


E-Library Perpustakaan Universitas Nusa Mandiri merupakan platform digital yang menyedikan akses informasi di lingkungan kampus Universitas Nusa Mandiri seperti akses koleksi buku, jurnal, e-book dan sebagainya.


INFORMASI


Alamat : Jln. Jatiwaringin Raya No.02 RT08 RW 013 Kelurahan Cipinang Melayu Kecamatan Makassar Jakarta Timur

Email : perpustakaan@nusamandiri.ac.id

Jam Operasional
Senin - Jumat : 08.00 s/d 20.00 WIB
Isitirahat Siang : 12.00 s/d 13.00 WIB
Istirahat Sore : 18.00 s/d 19.00 WIB

Perpustakaan Universitas Nusa Mandiri @ 2020