Predicting Online Course Popularity Using LightGBM: A Data Mining Approach on Udemy's Educational Dataset

Minh Luan Doan

doi:10.63913/ail.v1i2.11

Authors

Minh Luan Doan School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore

DOI:

https://doi.org/10.63913/ail.v1i2.11

Keywords:

LightGBM, Online Course Popularity, Machine Learning, Udemy, Predictive Modelling

Abstract

The increasing demand for online education has led to a rapid expansion of platforms such as Udemy, where predicting the popularity of courses can provide valuable insights for course creators and platform managers. This research aims to predict the popularity of online courses on Udemy using LightGBM, a powerful gradient boosting framework that is well-suited for classification tasks. The study begins with a dataset overview, which includes key course features such as payment type (is_paid), price, number of lectures, course level, content duration, subject, published timestamp, and number of subscribers. The preprocessing steps involved handling missing values, encoding categorical variables, and extracting temporal features from the publication date to capture trends over time. Exploratory Data Analysis (EDA) is conducted to uncover patterns and relationships within the dataset, including descriptive statistics and visualizations to understand distributions and correlations between variables. A correlation heatmap is used to identify significant associations between the predictors and the target variable, course popularity (measured by the number of subscribers). The core of the study employs the LightGBM model, which is trained using a train-test split approach and evaluated based on performance metrics such as accuracy, precision, and recall. The results show that features such as the number of lectures, price, and content duration have the greatest influence on course popularity, while certain features like course level show a limited impact. A comparative analysis with a baseline model reveals that LightGBM outperforms simple mean-based predictions in terms of predictive accuracy. The findings underscore the importance of course content structure and pricing strategies for increasing enrollment. Finally, the study discusses limitations, such as the lack of course quality metrics, and suggests avenues for future research, including the exploration of more advanced machine learning techniques and incorporating additional data sources for a more comprehensive model.

ISSN 3089-3690 (Online)
Organizer / Collaboration	:	Fakultas Sains dan Teknologi UIN Syarif Hidayatullah Jakarta
Published by	:	Bright Publisher
Website	:	ail.mbicore.com
Mailing Address	:	Graha Permata Estate, Jl. HM Bahrun Blok H9, Sokayasa, Berkoh, Kec. Purwokerto Tim., Kabupaten Banyumas, Jawa Tengah 53146
Email	:	arif@amikompurwokerto.ac.id (principal contact)
		editor@ail.mbicore.com (managing editor)

Predicting Online Course Popularity Using LightGBM: A Data Mining Approach on Udemy's Educational Dataset

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Published By

Make a Submission

ISSN

Quick Menu

Recommended Tools

Visitor Stats