IMPLEMENTASI ENSEMBLE MODEL UNTUK PREDIKSI FAKTOR RESIKO KANKER SERVIKS DENGAN KOMBINASI ALGORITMA MACHINE LEARNING

  • ANDICHA VEBIYATAMA
  • 14230014

ABSTRAK

ABSTRAK

Penelitian ini mengusulkan tiga model ensemble berbasis voting untuk prediksi faktor risiko kanker serviks menggunakan dataset UCI Cervical Cancer (Risk Factors). Model yang dikembangkan adalah Voting (XGB+KNN+LR), Voting (ADA+RF+SGD), dan Voting (CAT+SVM+NB). Pipeline penelitian mencakup KNN Imputer untuk menangani missing value, PCA untuk reduksi dimensi, dan SMOTE untuk mengatasi ketidakseimbangan kelas. Evaluasi dilakukan menggunakan crossvalidation dan pengujian akhir dengan metrik akurasi, presisi, recall, dan F1-score. Hasil eksperimen menunjukkan bahwa dua model pertama mencapai performa sempurna (100%) pada seluruh metrik pengujian, sedangkan model ketiga mencatatkan akurasi 99,61%, presisi 100%, recall 93,33%, dan F1-score 96,55%. Selain itu, kalibrasi probabilitas menunjukkan nilai Brier Score 0,012, menandakan reliabilitas prediksi yang tinggi. Temuan ini melampaui capaian penelitian sebelumnya (maksimal 99,99%), sehingga dapat diklaim sebagai state-of-the-art untuk prediksi kanker serviks pada dataset ini

KATA KUNCI

ALGORITMA MACHINE LEARNING


DAFTAR PUSTAKA

REFERENCES

[1] World Health Organization. (2023). Cervical cancer. Retrieved from https://www.who.int/news-room/factsheets/detail/cervical-cancer

[2] Stelze, Dominik et al. (2020). Estimates of the global burden of cervical cancer associated with HIV. The Lancet. https://doi.org/10.1016/S2214-109X(20)30459-9

[3] Guida, F., Kidman, R., Ferlay, J. et al. (2020). Global and regional estimates of orphans attributed to maternal cancer mortality in. Nat Med 28, 2563–2572 (2022). https://doi.org/10.1038/s41591-022-02109-2

[4] Aljrees, T. (2024). Improving prediction of cervical cancer using KNN imputer and multi-model ensemble learning. PLOS ONE, 19(1), e0295632. https://doi.org/10.1371/journal.pone.0295632

[5] Karamti, H., Alharthi, R., Anizi, A.A., Alhebshi, R.M., Eshmawi, A.A., Alsubai, S., & Umer, M. (2023). Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach. Cancers, 15(4412). https://doi.org/10.3390/cancers15174412

[6] Munshi, R. M. (2024). Novel ensemble learning approach with SVM-imputed ADASYN features for enhanced cervical cancer prediction. PLOS ONE, 19(1), e0296107. https://doi.org/10.1371/journal.pone.0296107

[7] Al Mudawi, N., & Alazeb, A. (2022). A Model for Predicting Cervical Cancer Using Machine Learning Algorithms. Sensors, 22(11), 4132. https://doi.org/10.3390/s22114132.

[8] Shakil, R., Islam, S., & Akter, B. (2024). A precise machine learning model: Detecting cervical cancer using feature selection and explainable AI. Journal of Pathology Informatics, 15, 100398. https://doi.org/10.1016/j.jpi.2024.100398

[9] Le Ngoc, H., & Vo Pham Huyen, K. (2023). An approach of cervical cancer diagnosis using class weighting and oversampling with Keras. TELKOMNIKA Telecommunication Computing Electronics and Control, 21(1), 142-149. https://doi.org/10.12928/TELKOMNIKA.v21i1.24240 [10]Uddin, K. M. M., Al Mamun, A., Chakrabarti, A., Mostafiz, R., & Dey, S. K. (2024). An ensemble machine learning-based approach to predict cervical cancer using hybrid feature selection. Neuroscience Informatics, 4, 100169. https://doi.org/10.1016/j.neuri.2024.100169

[11]Chauhan, R., Goel, A., Alankar, B., & Kaur, H. (2024). Predictive modeling and web-based tool for cervical cancer risk assessment: A comparative study of machine learning models. MethodsX, 12, 102653. https://doi.org/10.1016/j.mex.2024.102653

[12]Kaushik, M., Joshi, R. C., Kushwah, A. S., Gupta, M. K., Banerjee, M., Burget, R., & Dutta, M. K. (2021). Cytokine gene variants and socio-demographic characteristics as predictors of cervical cancer: A machine learning approach. Computers in Biology and Medicine, 134, 104559. https://doi.org/10.1016/j.compbiomed.2021.104559

[13]Ashtagi, R., Rajput, V., Antad, S., Chopade, P., Chivate, A., Chitpur, S., & Dashetwar, I. (2024). Cervical Cancer Prediction Using Machine Learning. J. Electrical Systems, 20(1s), 944-955. https://creativecommons.org/licenses/by/4.0/legalcode

[14]Wardhana, R. G., Wang, G., & Sibuea, F. (2023). Penerapan Machine Learning dalam Prediksi Tingkat Kasus Penyakit di Indonesia. Journal of Information System Management (JOISM), 5(1), 23-35.

[15]Park, S., Lee, J., & Kim, H. (2023). Principal Component Analysis (PCA) imputer: A method for handling missing data in large datasets. Behavior Research Methods. https://doi.org/10.3758/s13428-023-02099-0

[16]Guido, R., Ferrisi, S., Lofaro, D., & Conforti, D. (2024). An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review. Information, 15(4), 235. https://doi.org/10.3390/info15040235

[17]Elreedy, D., Atiya, A. F., & Kamalov, F. (2023). A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Machine Learning, 113, 4903–4923. https://doi.org/10.1007/s10994-022-06296-4

[18]Matyukira, C., & Mhangara, P. (2023). Land Cover and Landscape Structural Changes Using Extreme Gradient Boosting Random Forest and Fragmentation Analysis. Remote Sensing, 15(23), 5520. https://doi.org/10.3390/rs15235520

[19]Moore, A., & Bell, M. (2022). XGBoost, A Novel Explainable AI Technique, in the Prediction of Myocardial Infarction: A UK Biobank Cohort Study. Clinical Medicine Insights: Cardiology, 16, 1-6. https://doi.org/10.1177/11795468221133611

[20]Srisuradetchai, P., & Suksrikran, K. (2024). Random kernel k-nearest neighbors regression. Frontiers in Big Data, 7. https://doi.org/10.3389/fdata.2024.1402384

[21]Huang, Y., & Sun, Z. (2024). Triglyceride levels are associated with 30-day mortality in intensive care patients: a retrospective analysis in the MIMIC-IV database. European Journal of Medical Research, 29, 561. https://doi.org/10.1186/s40001-024-02159-x

[22]Hairani, H., Anthony, A., Dadang, P. (2022). Improvement Performance of the Random Forest Method on Unbalanced Diabetes Data Classification Using Smote-Tomek Link. International Journal on Informatics Visualization is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. https://doi.org/10.30630/joiv.7.1.1069

[23]Odey, A., Ali, S ., Ghassan, A and Ali, S A. (2022). Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost

PredictionSustainability 2022, 14, 6651. https://doi.org/10.3390/su14116651

[24]Guan, H.; Xu, H.; Cai, L. (2024). Requirement Dependency Extraction Based on Improved Stacking Ensemble Machine Learning. Mathematics 2024, 12, 1272. https://doi.org/10.3390/math12091272

[25]Hussein, Abdelbaset & Kafri, Mohamed & Abonamah, Abdullah & Tariq, Muhammad Usman. (2020). Mood Detection Based on Arabic Text Documents using Machine Learning Methods. International Journal of Advanced Trends in Computer Science and Engineering. 9. 4224-4336. 10.30534/ijatcse/2020/36942020. https://doi.org/10.30534/ijatcse/2020/36942020

[26]Ss, Shivashankara. (2021). Shape Based Continuous Real Time Hand Gesture Recognition System of American Sign Language using KNN Classifier. International Conference on Sustainable Computing in Science, Technology & Management (SUSCOM-2019). at: https://www.researchgate.net/publication/355912995

[27]Mahbod, I., Diana, Z., Shahriar, K., David, S. L. (2024) Machine learning and deep learning algorithms in stroke medicine: a systematic review of hemorrhagic transformation prediction models. Journal of Neurology (2025) 272:37 https://doi.org/10.1007/s00415-024- 12810-6\

[28]Peykani, P., Peymany Foroushany, M., Tanasescu, C., Sargolzaei, M., & Kamyabfar, H. (2025). Evaluation of cost-sensitive learning models in forecasting business failure of capital market firms. Mathematics, 13(3), 368. https://doi.org/10.3390/math13030368

[29]Onah, E., Eze, U. J., Abdulraheem, A. S., Ezigbo, U. G., Amorha, K. C., & Ntie-Kang, F. (2025). Optimizing unsupervised feature engineering and classification pipelines for differentiated thyroid cancer recurrence prediction. BMC Medical Informatics and Decision Making, 25(182). https://doi.org/10.1186/s12911-025- 03018-3

Detail Informasi

Tesis ini ditulis oleh :

  • Nama : ANDICHA VEBIYATAMA
  • NIM : 14230014
  • Prodi : Ilmu Komputer
  • Kampus : Margonda
  • Tahun : 2025
  • Periode : I
  • Pembimbing : Dr. Muhammad Haris, M. Eng
  • Asisten :
  • Kode : 0012.S2.IK.TESIS.I.2025
  • Diinput oleh : SGM
  • Terakhir update : 08 Desember 2025
  • Dilihat : 51 kali

TENTANG PERPUSTAKAAN


PERPUSTAKAAN UNIVERSITAS NUSA MANDIRI


E-Library Perpustakaan Universitas Nusa Mandiri merupakan platform digital yang menyedikan akses informasi di lingkungan kampus Universitas Nusa Mandiri seperti akses koleksi buku, jurnal, e-book dan sebagainya.


INFORMASI


Alamat : Jln. Jatiwaringin Raya No.02 RT08 RW 013 Kelurahan Cipinang Melayu Kecamatan Makassar Jakarta Timur

Email : perpustakaan@nusamandiri.ac.id

Jam Operasional
Senin - Jumat : 08.00 s/d 20.00 WIB
Isitirahat Siang : 12.00 s/d 13.00 WIB
Istirahat Sore : 18.00 s/d 19.00 WIB

Perpustakaan Universitas Nusa Mandiri @ 2020