Maintain BigQuery data lineage by enriching Google Cloud Data Catalog tags with Dataprep metadata and profiling results Cataloging Dataprep Pipelines Google Cloud Data Catalog is the defacto metadata cataloging solution for your analytics initiatives on Google Cloud. Data Catalog natively and automatically captures BigQuery datasets, tables, and views, which gives you visibility into your data […]
More Victor Coustenoble • May 26, 2021 Build a simple, flexible, yet comprehensive Data Quality monitoring solution for your Google Cloud Dataprep by Trifacta pipelines with Cloud Functions, BigQuery and Data Studio Building a Data Quality Dashboard Building a modern data stack to manage analytic pipelines—such as Google Cloud and a BigQuery data warehouse or data lake—has many benefits. One such benefit […]
More Victor Coustenoble • December 21, 2020 If you manage a data and analytics pipeline in Google Cloud, you may want to monitor it and obtain a comprehensive view of the end-to-end analytics process in order to react quickly when something breaks. This article shows you how you can capture Cloud Dataprep jobs status via APIs leveraging Cloud Functions. We then input […]
More Victor Coustenoble • July 28, 2020 With a better mastery of Cloud Functions, you can trigger a Dataprep job via API when a file lands in a Cloud Storage bucket Ever dreamt about automating your entire data pipeline to load your data warehouse? Without automation each user needs to manually upload its data, then manually start a transformation job or wait […]
More Victor Coustenoble • May 26, 2020 Après mes voeux et quelques idées pour 2018 inspirés par le CEO de Trifacta, rentrons maintenant dans le vif du sujet et répondons à la question “Qu’est ce que le Data Wrangling ?”. Des données brutes à l’analyze : Le Data Wrangling, aussi appelé Préparation de Données en Self-Service, est le processus qui permet à […]
More Victor Coustenoble • February 26, 2018