PERFORMA KLASIFIKASI DATA TIDAK SEIMBANG DENGAN PENDEKATAN MACHINE LEARNING (STUDI KASUS: DIABETES INDIAN PIMA)
DOI:
https://doi.org/10.25077/jmua.12.2.176-193.2023Keywords:
Diabetes, Data Tidak Seimbang, Machine LearningAbstract
Diabetes merupakan suatu penyakit atau gangguan metabolisme kronis dengan multi etiologi yang ditandai dengan tingginya kadar gula darah disertai dengan gangguan metabolisme karbohidrat, lipid, dan protein sebagai akibat insufisiensi fungsi insulin. Faktor risiko diabetes berhubungan dengan status diabetes sesorang. Berbagai pendekatan machine learning menjadi alternatif dalam memprediksi status diabetes. Namun, dalam banyak kasus, data yang tersedia tidak cukup seimbang dalam kelas datanya. Adanya ketidakseimbangan data dapat menyebabkan hasil prediksi menjadi tidak akurat. Tujuan penelitian dalam paper ini adalah untuk mengatasi masalah ketidakseimbangan data dan membandingkan kinerja model dalam memprediksi status diabetes. Secara umum, metode seperti Synthetic Minority Over-sampling Technique (SMOTE) dan Adaptive Synthetic (ADASYN) dapat digunakan untuk menyeimbangkan data. Data Diabetes Indian Pima yang telah diseimbangkan kemudian diprediksi dengan metode machine learning seperti metode Bagging, Random Forest, dan XGBoost. Hasil penelitian menunjukkan bahwa performa model machine learning meningkat setelah menangani ketidakseimbangan data dan model terbaik adalah model XGBoost.Â
References
American Diabetes Association, 2015, Standards of medical care in diabetes 2015 abridged for primary care providers, Clinical Diabetes, 33(2) : 97 – 111
Baros, R. C., Basgalupp, M. P., Carvalho, A. C. P. L. F., & Freitas, A. A., 2011, Towards the automatic design of decision tree induction algorithms, Dalam: Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, 182 – 196
Yang, P., Yang, Y. H., Zhou, B. B., & Zomaya, A. Y., 2010, A Review of Ensemble Methods in Bioinformatics: Including Stability of Feature Selection and Ensemble Feature Selection Methods, Current Bioinformatics, 5(4) : 296 – 308
Aqsha, M., Thamrin, S. A., & Lawi, A., 2021, Combination of ADASYN-N and Random Forest in Predicting of Obesity Status in Indonesia: A Case Study of Indonesian Basic Health Research 2013, Journal of Physics: Conference Series, 2123 : 012039
Rombe, Y., Thamrin, S. A., & Lawi, A., 2022, Application of Adaptive Synthetic Nominal and Extreme Gradient Boosting Methods in Determining Factors Affecting Obesity: A Case Study of Indonesian Basic Health Research Survey 2013, Indonesian Journal of Statistics and Its Applications, 6(2) : 309 – 317
Mustaqim, M., Warsito, B., & Surarso, B., 2019, Kombinasi Synthetic Minority Oversampling Technique (SMOTE) dan Neural Network Backpropagation untuk menangani data tidak seimbang pada prediksi pemakaian alat kontrasepsi implan, Jurnal Ilmiah Teknologi Sistem Informasi, 5(2) : 116 – 127
He, H., Bai, Y., Gracia, E. A., & Li, S., 2008, ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning, Dalam: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322 – 1328
Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., & Abdullah, N. N., 2014, An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets, Dalam: Herawan, T., Deris, M., Abawajy, J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), 285 : 13 – 22
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P., 2002, SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research, 16(1) : 321 – 357
Zhu, T., Lin, Y., & Liu, Y., 2017, Synthetic Minority Oversampling Technique for Multiclass Imbalance Problems, Pattern Recognition, 72(C) : 327 – 340
Alghamdi, M., Al-Mallah, M., Keteyian, S., Brawner, C., Ehrman, J., & Sakr, S., 2017, Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project, PLOS ONE, 12(7) : e0179805
Breiman, L., 1996, Bagging Predictors In Machine Learning, Kluwer Academic, Boston
Han, J., Kamber, M., & Pei, J., 2012, Data Mining: Concept and Techniques, Edisi ke-3, Elsevier Inc, USA
Breiman, L., 2001, Random Forests In Machine Learning, Kluwer Academic, Boston
Chen, T., & Guestrin, C., 2016, XGBoost: A Scalable Tree Boosting System, Dalam: International Conference on Knowledge Discovery and Data Mining, 785 – 794
Patil, I., 2021, Visualizations with statistical details: The ’ggstatsplot’ ap- proach, Journal of Open Source Software, 6(61) : 3167
Downloads
Additional Files
Published
Issue
Section
License
All articles published in Jurnal Matematika UNAND (JMUA) are open access and licensed under the Creative Commons Attribution-ShareAlike (CC BY-SA) license. This ensures that the content is freely available to all users and can be shared and adapted, provided appropriate credit is given and any adaptations are distributed under the same license.
Copyright Holder
The copyright of all articles published in Jurnal Matematika UNAND is held by the Departemen Matematika dan Sains Data, Fakultas Matematika dan Ilmu Pengetahuan Alam (FMIPA), Universitas Andalas (UNAND). This applies to all published versions, including the HTML and PDF formats of the articles.
Author Rights
While the Departemen Matematika dan Sains Data FMIPA UNAND holds the copyright for all published content, authors retain important rights under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA). This license grants authors and users the following rights:
- Reuse: Authors can reuse and distribute their work for any lawful purpose, including sharing on personal websites, institutional repositories, or in subsequent publications.
- Attribution and Adaptation: Authors and others may remix, adapt, and build upon the published work for any purpose, even commercially, as long as proper credit is given to the original authors, and any derivative works are distributed under the same CC BY-SA license.
Creative Commons License (CC BY-SA)
Under the terms of the CC BY-SA license, users are free to:
- Share: Copy and redistribute the material in any medium or format.
- Adapt: Remix, transform, and build upon the material for any purpose, even commercially.
However, the following conditions apply:
- Attribution: Users must give appropriate credit to the original author(s) and Departemen Matematika dan Sains Data FMIPA UNAND, provide a link to the license, and indicate if changes were made. Attribution must not imply endorsement by the author or the journal.
- ShareAlike: If users remix, transform, or build upon the material, they must distribute their contributions under the same license as the original.
For more information about the CC BY-SA license, please visit the Creative Commons website.
Third-Party Content
If authors include third-party material (such as figures, tables, or images) that is not covered by a Creative Commons license, they must obtain the necessary permissions for reuse and provide proper attribution. Authors are required to ensure that any third-party content complies with open-access licensing requirements or includes permissions for redistribution under similar terms.
Copyright and Licensing Information Display
The copyright and licensing terms will be clearly displayed on each article's landing page, as well as within the full-text versions (HTML and PDF) of all published articles.
No "All Rights Reserved"
As an open-access journal, JMUA does not use "All Rights Reserved" policies. Instead, the CC BY-SA license ensures that the works remain accessible and reusable for a wide audience while still protecting both the authors' and the copyright holder's rights.
Â









