See How Data Engineering Gets Done on Our Do-It-Yourself Data Webcast Series

Start Free

Speed up your data preparation with Trifacta

Free Sign Up
Summer of SQL

A Q&A Series with Joe Hellerstein

See why SQL is Back
 
All Blog Posts

Analyze YouTube Channel Data With Dataprep and BigQuery

July 20, 2021

You have launched your YouTube channel with great fanfare! You have posted your first few videos, and the likes and comments are starting to flow in – hooray!

Now what?

Obviously you have invested a lot of effort in launching your channel. You want to monitor how your videos are doing, and how the audience is engaging with them. You want to take a regular pulse of what content is drawing the most likes and comments, and observe engagement trends over time. But how do you capture, store and analyze this data?

In this article, I describe how you can use Google Cloud Dataprep by Trifacta (or simply “Dataprep”) to pull YouTube channel data, prepare it and store it in Google BigQuery. The techniques described here are useful regardless of whether you are a solo entrepreneur or the head of marketing for a large enterprise.

Step 1: Acquire the data

You begin by using Dataprep to connect to your YouTube channel data. Head over to clouddataprep.com and sign up for a free trial. From the homepage, choose Create a new flow. Then choose Connect to your data and Import datasets.

At the bottom of the screen, click on + New  to add a new connection. In the dialog box that follows, choose YouTube Analytics (Early Preview). In the Create Connection dialog box, choose Authenticate and enter the credentials for the Google account that hosts the YouTube channel, then choose Create to create the connection. You can now browse the youtube_analytics connection and view the individual tables. Choose the Videos table to preview it, then choose Import & Add to Flow.


Step 2: Prepare the data

From the flow view, select the Videos recipe and choose Edit Recipe. This shows the dataset in a grid layout, and lets you explore the columns and values present. Helpfully, Dataprep automatically profiles your data, and shows a value distribution for each column (see example below).

You can further explore this profile by choosing Column Details from the dropdown menu near each column. If you click on any of the charts that appear, you will be presented with various suggestions on how to transform the data. For example, you can select a particular date range (eg. Jan 2019 to July 2021), and the tool will suggest a recipe step that applies a filter based on that date range.

You can join in additional datasets to enrich your data by choosing Join from the toolbar. For example, you can pull in the category title by performing a left join with the VideoCategories dataset, using the CategoryId field as the join key as shown below.

Using this method, you can develop a recipe consisting of several transformations, all without writing a single line of code. This includes deleting columns, adding calculated columns and performing lookups, joins, unions and aggregations. You can undo/redo any actions, and even apply data quality rules to check the validity of your data. Every step of the way, Dataprep gives you visual feedback and suggestions to help you achieve your desired results.

Step 3: Store the data

When you are happy with the resulting dataset, you can store the results by publishing the output. Select the Run button at the top of the page. Under Publishing Actions, choose + Add Action and pick BigQuery. Choose a dataset, specify the table name to store the results in, and choose how you want the results to be written. Then click on Add.

Now click on Run to execute the job and write the results to BigQuery.

Step 4: Analyze the results

The final flow looks like the following:

Once the job execution is complete, you can view the output profile from the Jobs pane. You can now launch BigQuery and start querying the prepared data using SQL. For example, the following query displays the top 5 public videos based on ViewCount.

Additionally, you can visualize the results by creating a custom Google Data Studio dashboard that reads from the BigQuery table.

Conclusion

We covered how to prepare YouTube channel data using Dataprep, and analyze it using BigQuery. The resulting data pipeline can be executed repeatedly to refresh the data based on a schedule (eg. weekly). As this data accumulates, you can observe trends in audience engagement, which ultimately informs you about what type of content to invest in to obtain the maximum ROI. Having the data in BigQuery opens a world of possibilities. You can pull the data into Google Sheets, or create dashboards using Data Studio, Looker or any other business intelligence tool.

We barely scratched the surface in this tutorial. There are many more tables in the YouTube dataset containing nuggets of valuable data. If you haven’t already done so, I highly encourage you to explore them yourself by signing up for Dataprep.