AdPisika: an adaptive e-learning system utilizing k-means clustering, decision tree, and bayesian network based on felder-silverman model to enhance physics academic performance

ABSTRACT


Introduction
Students' academic performance has become a growing concern in education since the transition from traditional to online learning and teaching started, which became necessary due to the COVID-19 outbreak.In the Philippines, the factors affecting the quality of the educational experience are accessibility, cost, maintenance, resource limitation in the adoption to online learning, and learning environment at home [1], [2].These lead to an opportunity for advancements in education and the introduction of modern technology, particularly artificial intelligence-based Adaptive Learning Systems (ALS) [3], [4].
Most ALS, such as HELPNAYAN [5] and Aidgebra [6], consider either the student's learning styles or knowledge level to tailor the learning process and improve academic performance, with the intention to support students' learning in the Philippines.Based on their respective research objectives, both ALS were effective for both students and teachers.However, HELPNAYAN considered only the students' learning style, while Aidgebra considered only the student's knowledge level; none of these ALS considered both parameters.An ALS with learning style and knowledge level identification has the potential to enhance learning and evaluate academic performance for students who experience a variety of difficulties with learning through digital technology.This can also assist educators in how they will assess student performance [7].
Mathematics and Science are subjects Filipino students find hard to learn.According to Trends in International Mathematics and Science Study 2019 (TIMSS) [8], the Philippines ranks last among the 58 participating countries for Grade 4 Mathematics and Science assessments, and in a student assessment conducted by the Program for International Student Assessment (PISA) for 15-year-old learners, the country ranks second to last among 79 participating countries with 353 and 357 points [9].This implies that Filipino students has the lowest proficiency in Mathematics and Science.Furthermore, according to the study by Guido [10] Physics is among the science subjects that students consider as being the most challenging throughout secondary school up until college.
According to Jamon et al. [11] the new approach in learning offers opportunities for teachers to use technology to support student learning.This includes the introduction of numerous platforms and software that learners can use in place of printed modules.The use of e-learning for the improvement of the physics educational process has been strongly recommended in a study by Shurygin et al. [12].Furthermore, online instructional modules are considered effective tools for enhancing students' understanding of these subjects, having an improvement of 13% in their post-test scores in the fundamentals of physics and mathematics [13].
The objective of this study is to create an adaptive web-based system that incorporates classification and recommendation tasks [14].In accordance with the United Nations' Sustainable Development Goal 4 (SDG 4) [15], the proposed system aims to provide personalized learning materials, evaluate students' physics academic performance, and contribute to improving the country's scientific literacy.The system utilizes machine learning [16] algorithms such as K-means clustering for clustering students' knowledge levels [17], and the Decision tree [18] and Bayesian Network [19], [20] based on the FSLSM [21] model for classification and recommendation of learning objects.Through these efforts, the system aims to adapt to individual students' needs and ensure quality education for all learners.
The remainder of the paper is provided below.The proposed methodology for developing an adaptive learning system is described in Section 2, the results are shown and discussed in Section 3, and the study is concluded in Section 4.

Research Design
The proposed adaptive e-learning system for Grade 10 students was evaluated in the study using a quantitative approach to determine its effectiveness and efficiency.Research design show as Fig. 1.An experimental research design was employed in this study, consisting of two categories: experimental and control.The proposed adaptive learning system was used by the experimental group, whilst the control group followed the traditional method of learning.

Software Tools
The proposed ALS utilized HTML, CSS, React.js, an open-source JavaScript framework, and Bootstrap, a CSS framework for the front end of the system.Node.js, a JavaScript framework for the back end of the system.For the data training for the K-means Clustering [22] and DT, the researchers used Python.

Research Method
The experimental group used the proposed adaptive learning system for two (2) weeks of creating accounts, participating in assessment sessions, studying the teaching material, and responding to preand post-tests.The categorization of processes and actions needed to complete a system is illustrated by the system data flow chart.A modified version of the proposed ALS by Alcantara et al. [3], as depicted in Fig. 2 illustrates the sequence of actions that a learner may undertake.

Knowledge Level Identification
Students who will be using the proposed system must take the pre-test upon accessing the chosen module.The current knowledge level will then be determined using K-means clustering, which will group the data into three (3) clusters.This will be represented by the decision tree as beginner, intermediate, and advanced knowledge levels.Additionally, the pre-test on each module has the following level of difficulty to assign weights to the questions.
The clustering process is applied to a dataset of pre-test scores from 25 students in the experimental group, resulting in three distinct clusters.Cluster 1 represents students who performed poorly or excelled in only one area.Cluster 2 includes students who performed well in two areas but struggled with difficult questions.Cluster 3 consists of students who performed well across all question types.Next, the Decision Tree algorithm classifies the clustered groups based on students' knowledge levels.The Beginner class (Cluster 1) requires additional support and a review of basic concepts, focusing on simple topic discussions and engaging activities.The Intermediate class (Cluster 2) benefits from more challenging assignments and opportunities to apply knowledge to advanced problems, with a focus on complex topic discussions.The Advanced class (Cluster 3) involves more advanced coursework, featuring brief concept discussions and prioritizing advanced practice applications.
The proposed adaptive learning system incorporates fixed and reserved questions (see Fig. 3).Fixed questions refer to the questions that the system first displayed on pre-test assessments.The reserved questions are questions that the system has stored in its question bank.For instance, if a student correctly answered question 1 on the pre-test, the reserved question will replace the item on the post-test; however, if the student selects an incorrect answer, the fixed question will retain in the post-test.

Learning Objects based on Knowledge Level and Learning Styles
The Decision Tree will be used to provide learning materials that best match each student following their clusters.These students may benefit from the relevant learning objects that will be made available to them in accordance with their level of knowledge and understanding of the subject.This might guarantee that the best learning resources are being provided to the students.Also, depending on the learner's ILSQ results, the proposed system, in particular the Bayesian Network, will also assign and provide corresponding learning resources in accordance with the students' top-ranked learning styles based on FSLSM.The selection and designation of learning styles and their associated learning objects will be represented by a Directed Acyclic Graph (DAC), as depicted on Fig. 3.After finishing a module, the learners are required to take a post-test examination.The passing grade for the test will be 75% as per DepEd Order No. 8, Section 2015.Students who receive a grade below this threshold will be labeled as "Failed" by the system, and they will not be allowed to proceed to the next module until they passed the exam.Alternatively, if the student performs well, the system will let them continue using the primary learning approach while working through the remaining modules.

Results and Discussion
This chapter presents quantitative data of the proposed system such as student learning styles, cluster and knowledge levels, and pre-and post-test results Based on the graphs, a total of twenty-five (25) students registered for the AdPisika system.Whereas, 4 (16%) students are active learners as their top learning style, 2 (8%) students are reflective learners, 11 (44%) students are intuitive learners, 3 (12%) students are verbal learners, 1 (4%) student is a sequential learner, 4 (16%) students are global learners and none of the students are categorized as visual learner and sensing learners.

Clustering Pre-test Scores using K-Means Clustering
The pre-test has twenty (20) items with a total of 35 points, which is designed to assess the student's current knowledge of the module.WCSS Description show as Fig. 6.The sum of the squared distances between each cluster member and its centroid is known as the Within-Cluster Sum of Squares (WCSS).As depicted in the graph, it was determined that the "elbow point" occurs at 3, where the rate of WCSS decrease significantly.The corresponding K value at this point is considered the optimal value of K or the optimal number of clusters.
It is significant to note that because this was an unsupervised learning process, there is no ideal cluster structure or ranking and no target values.The Silhouette Score, which determines the average distance between a given data point and all of the data points from the closest cluster, is to be used to assess the accuracy of a k-means model.Based on the calculated Silhouette Score of 0.7, which is close to 1, it can be inferred that the data points are in close proximity to other points within the same cluster while being significantly distant from neighboring clusters.The data clustered per module using K-means is shown in Fig. 7.The root node (top node) condition states that if a student is assigned a cluster value of 1, the test outcome for CLUSTER ≤ 1.5 is considered TRUE, classifying them as Beginners.If the cluster value is 2, the root node has a FALSE outcome for CLUSTER ≤ 1.5, but a TRUE outcome for CLUSTER ≤ 2.5, indicating that the student is at an Intermediate level.Lastly, if the cluster value is 3, the nodes CLUSTER ≤ 1.5 and CLUSTER ≤ 2.5 have FALSE outcomes, classifying the student as Advanced.With a value of 0.85 across all metrics used (see Table 1), the results imply that the model had consistent performance in these evaluation metrics.

Results on Experimental Group per Module
Pre-and post-test results for students using the proposed system are displayed in Table 3. Significant findings were observed in the analysis of the data from the Electromagnetic Spectrum, Light, Electricity, and Magnetism Modules.The experimental group showed substantial improvements in post-test scores after using the proposed system compared to their pre-test scores, as confirmed by the p-values obtained from the Paired-Sample T-tests conducted.The corresponding p-values were 0.00019, 1.94E-09, 3.43E-07, and 2.05E-10, respectively, all of which were below the significance level of 0.05.

Results of Experimental Group and Control Group Post-Test Scores
The post-test results of students who used the traditional learning system and those who utilized the adaptive learning system are compared in Table 4.The experimental group showed highly significant improvements in post-test results as compared to the control group with p-values of 2.29E-06, 2.73E-32, 8.69E-10, and 3.57E-10.These p-values further confirm the effectiveness of the proposed system in enhancing student learning outcomes across all modules.
3.6.Improvement Percentage of using Adaptive E-learning System from Traditional Learning Based on the results of the post-test, Table 5 displays the percentage of improvement for students using an adaptive e-learning system in comparison to traditional learning.The students who utilized the proposed system showed substantial improvements in their performance across the Electromagnetic Spectrum, Light, Electricity, and Magnetism modules, with increases of 28.8%, 41.4%, 31.9%, and 32.9%, respectively.These findings provide strong evidence that the adaptive e-learning system had a significant positive impact on post-test scores compared to pre-test scores, surpassing the outcomes achieved with the traditional learning approach.

Students' Performance According to Learning Styles
The performance of students with regard to learning styles is shown in Table 6.Table 6 shows that incorporating learning styles as a parameter was effective for the proposed system, with p-values for reflective, intuitive, verbal, and global being below the significance level of 0.05, except Sequential, therefore rejecting the null hypothesis.This indicates an improvement in the post-tests of the students when learning style is used as a parameter.

Students' Performance According to Learning Styles
Table 7 shows the summary of the system evaluation answered by the students.Based on the data gathered, the system scored a mean of at least 4.43 on all categories.The system's acceptability has a grand means for functionality stability, performance efficiency, compatibility, and reliability, with mean values of 4.49, 4.43, 4.43, 4.8, and 4.47 respectively, representing an excellent overall rating.The user interface of the proposed system, AdPisika, is shown in Fig. 10.

Fig. 1 .
Fig. 1.Research Design Fig.2.sequence of actions that a learner may undertake

2. 3 . 3 .
Blending Approach for Correcting the Learning Path using Bayesian Network Using Netica application by Norys for Windows version 6.09, the Bayesian network model of the system was created.The program will provide a new learning path that integrates the learners' dominant and second-ranked learning styles if the learner fails the evaluation as shown in Fig.4.The method used of filtering out instructional resources is a combination of the two learning styles, along with the passing rate and preferred LOs, representing the gathered information from all the learners based on interactions such as "bookmarks".

Fig. 5
Fig. 5 displays the distribution of the top and second-ranked learning styles based on students' responses to the ILSQ.

Fig. 7 .
Fig. 7. Clustering Students from (a) Electromagnetic Spectrum Module, (b) Light Module, (c) Electricity Module, and (d) Magnetism Module Pre-test Scores 3.3.Classifying Students Knowledge Level based on clustered data using Decision Tree Positioning Fig. 8 shows a graphical representation of how the Decision Tree classifies the clustered data.

Table 1 .
Results of Evaluation Metrics for Decision Tree Riva et al. (AdPisika: an adaptive e-learning system utilizing k-means clustering, decision tree, and bayesian network…)

Table 2
shows how many students fall in each cluster and different knowledge level categories.

Table 2 .
Total Clusters and Knowledge Levels per Module

Table 3 .
t-Test: Paired Two Sample for Means -Experimental

Table 4 .
Two-Sample Assuming Equal Variances Test of the Mean of Control and Experimental Group's

Table 5 .
Summary of Post-test Scores from Experimental and Control Group

Table 6 .
Summary of Students' Performance According to Learning Styles

Table 7 .
Results of AdPisika System Evaluation