Performance of Prediction Interval Estimators based on Random Forest Models with Correlated Predictors

Authors

  • Meisyatul Ilma IPB University
  • Bagus Sartono IPB University
  • Hari Wijayanto IPB University

DOI:

https://doi.org/10.25077/jmua.14.4.320-332.2025

Keywords:

Prediction interval, Random Forest, Quantile Regression Forest, Out of Bag Interval Prediction, Split Conformal Prediction, Multicollinearity

Abstract

Uncertainty in prediction results is a crucial aspect that needs to be taken into account in regression modeling, especially when there is a high correlation between explanatory variables. This study aims to evaluate the performance of three prediction interval formation approaches, namely Out-of-Bag Prediction Interval (OOB-PI), Quan tile Regression Forest (QRF), and Split Conformal Prediction (SC), in Random Forest modeling. The evaluation was conducted through a simulation study with a variety of data structures, including the level of correlation between variables, the shape of the mean function, and the type of error distribution. Further validation was conducted using data from the National Socio-Economic Survey (SUSENAS) of West Java Province in 2023. The results show that increasing the correlation between explanatory variables can improve the efficiency and accuracy of prediction interval estimation. Overall, OOB-PI showed the most balanced performance compared to the other two methods, with a prediction coverage rate close to 90% and a narrower interval width than QRF and SC. This finding indicates that OOB-PI is an adaptive and efficient approach for various data structures, including socioeconomic data with highly correlated predictors.

Author Biography

Meisyatul Ilma, IPB University

Mahasiswa Sarjana Universitas Andalas Jurusan Matematika dan Sains Data angkatan 2019 dan Mahasiswa Magister IPB University jurusan Statistika dan Sains Data angkatan 2023

References

[1] James, G., Witten, D., Hastie, T., Tibshirani, R., 2021, An Introduction to Statistical Learning, Springer Texts in Statistics, Springer, New York.

[2] Gregorich, M., Strohmaier, S., Dunkler, D., Heinze, G., 2021, Regression with Highly Correlated Predictors: Variable Omission is Not the Solution, International Journal of Environmental Research and Public Health, Vol. 18(8): 1–15.

[3] Bickel, P., Diggle, P., Fienberg, S., Gather, U., Olkin, I., Zeger, S., 2017, Springer Series in Statistics, Springer, Berlin.

[4] Fife, D. A., D’Onofrio, J., 2023, Common, Uncommon, and Novel Applications of Random Forest in Psychological Research, Behavior Research Methods, Vol. 55(5): 2447–2466.

[5] Johnson, R. A., 2024, Quantile-Forest: A Python Package for Quantile Regression Forests, Journal of Open Source Software, Vol. 9(93): 5976.

[6] Oliveira, R. I., Orenstein, P., Ramos, T., Romano, J. V., 2024, Split Conformal Prediction and Non-Exchangeable Data, Proceedings of Machine Learning Research, Vol. 25: 1–15.

[7] Zhang, H., Zimmerman, J., Nettleton, D., Nordman, D. J., 2020, Random Forest Prediction Intervals, Statistical Modelling, Vol. 20(5): 481–500.

[8] Harum, N. S., Aini, M., Risxi, M. A., Kartiasih, F., 2023, Pengaruh Sosial Ekonomi dan Kesehatan terhadap Pengeluaran Konsumsi Pangan Rumah Tangga Provinsi Jawa Tengah Tahun 2020, Seminar Nasional Official Statistics, Vol. 1: 899–908.

Downloads

Published

31-10-2025

Issue

Section

Articles