Comparing Random Forest and Logistic Regression for Predicting Student Completion in Online University Courses Using Behavioral Data

Authors

  • Muhamad Irfan Bahauddin Zakariya University
  • Abdul Sattar Bahauddin Zakariya University
  • Ahmad Sher Bahauddin Zakariya University
  • Muhamad Ijaz Bahauddin Zakariya University

DOI:

https://doi.org/10.63913/ail.v1i1.2

Keywords:

predictive analytics, random forest, logistic regression, online education, student retention

Abstract

This paper compares the performance of two machine learning algorithms, Random Forest and Logistic Regression, in predicting student course completion in online university courses using behavioral data. Behavioral data, including interaction logs and submission records, has proven to be crucial in identifying students at risk of non-completion. The study evaluates the models using standard classification metrics such as accuracy, precision, recall, and F1-score, based on real-world data from online courses. Both models demonstrate exceptionally high predictive accuracy, with Logistic Regression achieving perfect classification and Random Forest closely following. While Logistic Regression is favored for its simplicity and interpretability, Random Forest excels in handling complex, non-linear relationships within the data. The analysis of feature importance reveals that student engagement, particularly through viewing and passing course materials, is a strong predictor of course completion. These findings offer significant practical implications for online education, supporting early interventions to enhance student retention. However, limitations such as the absence of certain behavioral data and the linear assumption in Logistic Regression suggest areas for future research. Expanding the dataset to include discussion forums, peer interactions, or additional machine learning models may provide deeper insights into improving student success in online courses.

Downloads

Published

03-03-2025

How to Cite

Irfan, M., Sattar, A., Sher, A., & Ijaz, M. (2025). Comparing Random Forest and Logistic Regression for Predicting Student Completion in Online University Courses Using Behavioral Data. Artificial Intelligence in Learning, 1(1), 1–19. https://doi.org/10.63913/ail.v1i1.2