PERFORMA KLASIFIKASI DATA TIDAK SEIMBANG DENGAN PENDEKATAN MACHINE LEARNING (STUDI KASUS: DIABETES INDIAN PIMA)

Masjidil Aqsha; Nurtiti Sunusi

doi:10.25077/jmua.12.2.176-193.2023

PERFORMA KLASIFIKASI DATA TIDAK SEIMBANG DENGAN PENDEKATAN MACHINE LEARNING (STUDI KASUS: DIABETES INDIAN PIMA)

Authors

Masjidil Aqsha Hasanuddin University
Nurtiti Sunusi Hasanuddin University https://orcid.org/0000-0002-6436-831X

DOI:

https://doi.org/10.25077/jmua.12.2.176-193.2023

Keywords:

Diabetes, Data Tidak Seimbang, Machine Learning

Abstract

Diabetes merupakan suatu penyakit atau gangguan metabolisme kronis dengan multi etiologi yang ditandai dengan tingginya kadar gula darah disertai dengan gangguan metabolisme karbohidrat, lipid, dan protein sebagai akibat insufisiensi fungsi insulin. Faktor risiko diabetes berhubungan dengan status diabetes sesorang. Berbagai pendekatan machine learning menjadi alternatif dalam memprediksi status diabetes. Namun, dalam banyak kasus, data yang tersedia tidak cukup seimbang dalam kelas datanya. Adanya ketidakseimbangan data dapat menyebabkan hasil prediksi menjadi tidak akurat. Tujuan penelitian dalam paper ini adalah untuk mengatasi masalah ketidakseimbangan data dan membandingkan kinerja model dalam memprediksi status diabetes. Secara umum, metode seperti Synthetic Minority Over-sampling Technique (SMOTE) dan Adaptive Synthetic (ADASYN) dapat digunakan untuk menyeimbangkan data. Data Diabetes Indian Pima yang telah diseimbangkan kemudian diprediksi dengan metode machine learning seperti metode Bagging, Random Forest, dan XGBoost. Hasil penelitian menunjukkan bahwa performa model machine learning meningkat setelah menangani ketidakseimbangan data dan model terbaik adalah model XGBoost.Â

Author Biographies

Masjidil Aqsha, Hasanuddin University

Department of Statistics

Nurtiti Sunusi, Hasanuddin University

Department of Statistics

References

American Diabetes Association, 2015, Standards of medical care in diabetes 2015 abridged for primary care providers, Clinical Diabetes, 33(2) : 97 â€“ 111

Baros, R. C., Basgalupp, M. P., Carvalho, A. C. P. L. F., & Freitas, A. A., 2011, Towards the automatic design of decision tree induction algorithms, Dalam: Proceedings of the 13th annual conference companion on Genetic and evolutionary computation, 182 â€“ 196

Yang, P., Yang, Y. H., Zhou, B. B., & Zomaya, A. Y., 2010, A Review of Ensemble Methods in Bioinformatics: Including Stability of Feature Selection and Ensemble Feature Selection Methods, Current Bioinformatics, 5(4) : 296 â€“ 308

Aqsha, M., Thamrin, S. A., & Lawi, A., 2021, Combination of ADASYN-N and Random Forest in Predicting of Obesity Status in Indonesia: A Case Study of Indonesian Basic Health Research 2013, Journal of Physics: Conference Series, 2123 : 012039

Rombe, Y., Thamrin, S. A., & Lawi, A., 2022, Application of Adaptive Synthetic Nominal and Extreme Gradient Boosting Methods in Determining Factors Affecting Obesity: A Case Study of Indonesian Basic Health Research Survey 2013, Indonesian Journal of Statistics and Its Applications, 6(2) : 309 â€“ 317

Mustaqim, M., Warsito, B., & Surarso, B., 2019, Kombinasi Synthetic Minority Oversampling Technique (SMOTE) dan Neural Network Backpropagation untuk menangani data tidak seimbang pada prediksi pemakaian alat kontrasepsi implan, Jurnal Ilmiah Teknologi Sistem Informasi, 5(2) : 116 â€“ 127

He, H., Bai, Y., Gracia, E. A., & Li, S., 2008, ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning, Dalam: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 1322 â€“ 1328

Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., & Abdullah, N. N., 2014, An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets, Dalam: Herawan, T., Deris, M., Abawajy, J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), 285 : 13 â€“ 22

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P., 2002, SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research, 16(1) : 321 â€“ 357

Zhu, T., Lin, Y., & Liu, Y., 2017, Synthetic Minority Oversampling Technique for Multiclass Imbalance Problems, Pattern Recognition, 72(C) : 327 â€“ 340

Alghamdi, M., Al-Mallah, M., Keteyian, S., Brawner, C., Ehrman, J., & Sakr, S., 2017, Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project, PLOS ONE, 12(7) : e0179805

Breiman, L., 1996, Bagging Predictors In Machine Learning, Kluwer Academic, Boston

Han, J., Kamber, M., & Pei, J., 2012, Data Mining: Concept and Techniques, Edisi ke-3, Elsevier Inc, USA

Breiman, L., 2001, Random Forests In Machine Learning, Kluwer Academic, Boston

Chen, T., & Guestrin, C., 2016, XGBoost: A Scalable Tree Boosting System, Dalam: International Conference on Knowledge Discovery and Data Mining, 785 â€“ 794

Patil, I., 2021, Visualizations with statistical details: The â€™ggstatsplotâ€™ ap- proach, Journal of Open Source Software, 6(61) : 3167

Downloads

Additional Files

Figure, Table and Tex File

Published

03-02-2024

Issue

Vol. 12 No. 2 (2023)

Section

Articles

License

All articles published in Jurnal Matematika UNAND (JMUA) are open access and licensed under the Creative Commons Attribution-ShareAlike (CC BY-SA) license. This ensures that the content is freely available to all users and can be shared and adapted, provided appropriate credit is given and any adaptations are distributed under the same license.

Copyright Holder

The copyright of all articles published in Jurnal Matematika UNAND is held by the Departemen Matematika dan Sains Data, Fakultas Matematika dan Ilmu Pengetahuan Alam (FMIPA), Universitas Andalas (UNAND). This applies to all published versions, including the HTML and PDF formats of the articles.

Author Rights

While the Departemen Matematika dan Sains Data FMIPA UNAND holds the copyright for all published content, authors retain important rights under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA). This license grants authors and users the following rights:

Reuse: Authors can reuse and distribute their work for any lawful purpose, including sharing on personal websites, institutional repositories, or in subsequent publications.
Attribution and Adaptation: Authors and others may remix, adapt, and build upon the published work for any purpose, even commercially, as long as proper credit is given to the original authors, and any derivative works are distributed under the same CC BY-SA license.

Creative Commons License (CC BY-SA)

Under the terms of the CC BY-SA license, users are free to:

Share: Copy and redistribute the material in any medium or format.
Adapt: Remix, transform, and build upon the material for any purpose, even commercially.

However, the following conditions apply:

Attribution: Users must give appropriate credit to the original author(s) and Departemen Matematika dan Sains Data FMIPA UNAND, provide a link to the license, and indicate if changes were made. Attribution must not imply endorsement by the author or the journal.
ShareAlike: If users remix, transform, or build upon the material, they must distribute their contributions under the same license as the original.

For more information about the CC BY-SA license, please visit the Creative Commons website.

Third-Party Content

If authors include third-party material (such as figures, tables, or images) that is not covered by a Creative Commons license, they must obtain the necessary permissions for reuse and provide proper attribution. Authors are required to ensure that any third-party content complies with open-access licensing requirements or includes permissions for redistribution under similar terms.

Copyright and Licensing Information Display

The copyright and licensing terms will be clearly displayed on each article's landing page, as well as within the full-text versions (HTML and PDF) of all published articles.

As an open-access journal, JMUA does not use "All Rights Reserved" policies. Instead, the CC BY-SA license ensures that the works remain accessible and reusable for a wide audience while still protecting both the authors' and the copyright holder's rights.

PERFORMA KLASIFIKASI DATA TIDAK SEIMBANG DENGAN PENDEKATAN MACHINE LEARNING (STUDI KASUS: DIABETES INDIAN PIMA)

Authors

DOI:

Keywords:

Abstract

Author Biographies

Masjidil Aqsha, Hasanuddin University

Nurtiti Sunusi, Hasanuddin University

References

Downloads

Additional Files

Published

Issue

Section

License

Submission

Certified

Accreditation

ISSN

Plagiarism

Indexed by

Visitor Statistics