Weighted Association Rules Mining and Graph Analysis
I worked for a fitness-training startup that offered personalized recommendations to help people reach their fitness goals. The company wanted to know if applying data analytics and artificial intelligence/machine learning (AI/ML) techniques could answer some of their business questions and enhance trainees’ performance.
This is the second of a 3-part blog series (if you haven’t already, you can read part 1 here) that describes the low-cost analytics solution I created that allowed me to generate the relevant data, reshape and refine it, and visually discover and extract actionable insight.
Armed with the results of the principal component analysis (PCA), I was ready to return to the Trifacta Data Engineering Cloud and reformat the available data, generate the inputs to apply graph and weighted association rules mining (WARM) techniques, and unearth additional actionable knowledge that could help the startup’s trainers to enhance trainees’ performance, among other practical applications.
Weighted Association Rules Mining
Retailers use a technique called “association rules mining” to uncover associations among items. It allows retailers to identify relationships among items that customers buy by looking for combinations of items that occur together frequently in transactions.
In the context of my client, a fitness-training startup, each trainee was considered a “customer,” and variables related to the training process were considered items “bought” by the “customers.”
Figure 1 shows the results of the data prepared in the Trifacta Data Engineering Cloud to generate suitable inputs for the graph and WARM analysis. (Please note: trainees’ identifier data has been anonymized.)
Figure 2 shows a graph built and plotted using arules and igraph R packages’ tools. Tuning the plot function parameters (size, colors, etc.), it’s possible to surface a few interesting features, as well as key relationships among some of the variables.
For example, there seems to be a connection between BEBE_ALCOHOL_FRECUENTE (alcohol consumption) and other factors that clearly harm the trainees’ performance, like LESION_MUSCULAR_ARTICULAR_SI (muscular lesions) and HORAS_DUERME_NOCHE_5-6 (sleep deprivation).
To corroborate these visual findings and unearth more possible useful associations among variables, I carried out a detailed WARM analysis, applying the apriori and hits algorithms/methods available in the open-source R language framework.
Using this methodology, variables like BEBE_ALCOHOL_FRECUENTE (alcohol consumption) and HORAS_DUERME_NOCHE_5-6 (sleep deprivation) are the items in the baskets “bought” by trainees, or customers. Next I set out to uncover relevant relationships or rules between items.
Figure 3 and Figure 4 show graph and parallel coordinate plots, respectively. I also used the metric lift to rank the rules or item associations (in Figure 4, the thicker the red line, the higher the Lift value). When you look at both figures, it’s not hard to conclude that, for example, the incidence of muscular and articular lesions could be closely associated with respiratory issues and frequent alcohol consumption.
I then tabulated the rules ranked by the metric lift to better interpret and explain the data. Figure 5 shows the Top 10 associations I found.
By conducting WARM analysis, graph plotting, and rules tabulation, I was able to identify specific factors that could have a negative impact on trainees’ health and performance.
The startup was then interested in exploring new ways to optimize and personalize its training programs. They wondered if it was possible to extract additional knowledge from the data. And it was. I’ll explain it in the Blog #3 of this series. Stay tuned!