Content-Based Recommendation Engine I worked for a fitness-training startup that offered personalized recommendations to help people reach their fitness goals. The company wanted to know if applying data analytics and artificial intelligence/machine learning (AI/ML) techniques could answer some of their business questions and enhance trainees' performance. This is the third of a 3-part blog series (to catch up, you can read part 1 here and part 2 here) that describes the low-cost analytics solution I created that allowed me to generate the relevant data, reshape and refine it, and visually discover and extract actionable insight. After conducting principal component analysis (PCA) and applying graph and weighted association rules mining (WARM) techniques, I was able to identify specific factors that could have a negative impact on trainees’ health and performance. The startup was then interested in exploring new ways to optimize and personalize its training programs. They wondered if it was possible to extract additional knowledge from the data. Thanks to the Trifacta Data Engineering Cloud, I knew my answer would be YES. Content-Based Recommendation Engine (CBRE) I set out to design and implement a content-based recommendation engine (CBRE). I took features that characterized each trainee (age or gender, for example) and, using a trainer or an advanced trainee as a reference, I identified trainees that shared the most characteristics and recommended similar customized workout routines and nutrition supplements that had been tested and refined in the (more advanced) reference group. Creating a CBRE demanded the most complex data preparation recipes I’d prepared so far. Figure 1 shows as an example of a recipe created in the Trifacta Data Engineering Cloud. (Please note: trainees' identifier data has been anonymized.) I used one-hot encoding to transform categorical variables into numeric 0/1 codes. I generated the input required to evaluate a similarity matrix (using the Pearson Correlation Coefficient) to compare the trainees. The similarity matrix for this example is a (square) matrix where each row (column) corresponds to a trainee. The diagonal is filled with values equal to 1 (each trainee is identical to itself), and the off-diagonal elements are values between approximately -1 (very different) and approximately 1 (very similar). Calculations were performed in the R language framework. The similarity matrix was unpivoted and blended with the trainees' basic data and other relevant information to create a refined dataset to serve as core of the “recommender system.” The final step in implementing the CBRE was to make sure the end users could interact with and use the results, which were served as a fully interactive table and controls and as an easy-to-digest and informative visualization. Figure 2 depicts the built solution. Using a trainer or an advanced trainee as a reference, and selecting her/his identifier in the dropdown REFERENCE filter, the engine shows up as a list of the most similar trainees in descending order. Results are displayed to the right as a convenient, compelling visualization. The SLIDER control can be used to easily adjust the comparison's upper and lower similarity index bound values. Additional relevant data and information were included in the table, too. The filtered data can be exported as a .csv or .xls file or saved as a Google Sheet. Achieving Analytics Project and Athletic Performance Results The results of the CBRE were put to use by trainers and domain experts in my client’s organization to prescribe customized workout routines, nutrition supplements, and other recommendations that had been tested and refined in the reference group to improve trainees’ athletic performance. The results of my work in developing this end-to-end, low-cost analytics solution for my client proved that it's possible to deliver analytics solutions for an organization of any size and that you don’t need large data volumes to harness the power of AI/ML and achieve reliable results. But you need an intelligent, collaborative, and self-service data engineering cloud platform to transform data, ensure quality, and automate data pipelines. Try the Trifacta Data Engineering Cloud on your next analytics project.