Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Summer of SQL

A Q&A Series with Joe Hellerstein

See why SQL is Back
All Blog Posts

Leveraging Data Analytics to Enhance Athletic Performance: Part 2

August 12, 2021

Weighted Association Rules Mining and Graph Analysis

I worked for a fitness-training startup that offered personalized recommendations to help people reach their fitness goals. The company wanted to know if applying data analytics and artificial intelligence/machine learning (AI/ML) techniques could answer some of their business questions and enhance trainees’ performance.

This is the second of a 3-part blog series (if you haven’t already, you can read part 1 here) that describes the low-cost analytics solution I created that allowed me to generate the relevant data, reshape and refine it, and visually discover and extract actionable insight.

Armed with the results of the principal component analysis (PCA), I was ready to return to the Trifacta Data Engineering Cloud and reformat the available data, generate the inputs to apply graph and weighted association rules mining (WARM) techniques, and unearth additional actionable knowledge that could help the startup’s trainers to enhance trainees’ performance, among other practical applications.

Weighted Association Rules Mining

Retailers use a technique called “association rules mining” to uncover associations among items. It allows retailers to identify relationships among items that customers buy by looking for combinations of items that occur together frequently in transactions. 

In the context of my client, a fitness-training startup, each trainee was considered a “customer,” and variables related to the training process were considered items “bought” by the “customers.” 

Figure 1 shows the results of the data prepared in the Trifacta Data Engineering Cloud to generate suitable inputs for the graph and WARM analysis. (Please note: trainees’ identifier data has been anonymized.)

Figure 1: Trainee Data Prepared for Graph and WARM Analysis in the Trifacta Data Engineering Cloud

Graph Analysis

Figure 2 shows a graph built and plotted using arules and igraph R packages’ tools. Tuning the plot function parameters (size, colors, etc.), it’s possible to surface a few interesting features, as well as key relationships among some of the variables.

For example, there seems to be a connection between BEBE_ALCOHOL_FRECUENTE (alcohol consumption) and other factors that clearly harm the trainees’ performance, like LESION_MUSCULAR_ARTICULAR_SI (muscular lesions) and HORAS_DUERME_NOCHE_5-6 (sleep deprivation).

Figure 2: Graph of Possible Relevant Connections Among Variables Affecting Athletic Performance

To corroborate these visual findings and unearth more possible useful associations among variables, I carried out a detailed WARM analysis, applying the apriori and hits algorithms/methods available in the open-source R language framework.

Using this methodology, variables like BEBE_ALCOHOL_FRECUENTE (alcohol consumption) and HORAS_DUERME_NOCHE_5-6 (sleep deprivation) are the items in the baskets “bought” by trainees, or customers. Next I set out to uncover relevant relationships or rules between items.

Figure 3 and Figure 4 show graph and parallel coordinate plots, respectively. I also used the metric lift to rank the rules or item associations (in Figure 4, the thicker the red line, the higher the Lift value). When you look at both figures, it’s not hard to conclude that, for example, the incidence of muscular and articular lesions could be closely associated with respiratory issues and frequent alcohol consumption.

Figure 3: Graph Plot from WARM Analysis Highlighting Some Rules Associations

Figure 4: Parallel Coordinate Plot from WARM Analysis Illustrating Same Rules Associations

Rules Tabulation

I then tabulated the rules ranked by the metric lift to better interpret and explain the data. Figure 5 shows the Top 10 associations I found. 

Figure 5: Tabulated Top 10 Rules/Associations Ranked by Lift Metric

By conducting WARM analysis, graph plotting, and rules tabulation, I was able to identify specific factors that could have a negative impact on trainees’ health and performance. 

The startup was then interested in exploring new ways to optimize and personalize its training programs. They wondered if it was possible to extract additional knowledge from the data. And it was. I’ll explain it in the Blog #3 of this series. Stay tuned!