Dataprep by Trifacta is the only native and serverless data preparation solution on Google Cloud. Designed for enterprise-wide deployments, it can scale securely to support any number of users and any volume of data.
Dataprep Security Architecture
Dataprep by Trifacta is architected with data security in mind. Dataprep translates user-generated metadata describing data transformation logic into a job executed into Google Cloud Dataflow or BigQuery scalable data processing engines.
The job reads, transforms, and writes customer data between the data source and target systems with data never persisted outside of the customer Google Cloud Project resources. Trifacta uses a secure connection between the data source and target systems leveraging SSL/TLS encryption.
Dataprep’s web-interface is leveraged by users to define the data transformation logic and scheduled job execution. Trifacta stores these definitions in the form of metadata within Google Cloud SQL encrypted relational database, but Trifacta does not store any of the customer’s actual data.
Trifacta inherits existing user permissions set on data resources. As such, users can only prepare the data they have access to.
Customer Data is stored in the customer’s Google Cloud Project and in the region(s) where the customer decided to host its data and supported by Google Cloud data centers. Customer data is not stored in any Trifacta controlled Google Cloud Project except, if a Trifacta customer sources Google Sheets or Microsoft Excel, which need to be converted into a format supported by Dataprep service. In that case, the data is temporarily stored in Trifacta Google Cloud’s project solely for the duration of the conversion and is deleted after the processing is done.
The execution of the Customer’s Data preparation job using the Trifacta Software to transform the data from the Customer’s Data source and destination systems is accomplished via the Trifacta implementation in the customer’s Google Cloud Dataflow or BigQuery service in the region the customer selects for runtime. Any Customer Data processed by the Trifacta Software is not stored but only processed in motion except in those limited, data format dependent circumstances noted above.
Data at rest
Customer storage and databases are managed by the customer. Encryption is under the control of the customer.
Sample data, intermediate files, file job results are stored in the customer’s Google Cloud bucket. Encryption is under the control of the customer.
Dataprep service stores the customer’s metadata (e.g. data preparation recipes, flow names, user names, etc.) in Google Cloud SQL instances with AES-256 encryption.
Data in motion
Dataflow configuration is managed by the customer. Dataflow encryption is under the control of the customer.
Browser communication is encrypted with TLS.
All API communications between Google Services are encrypted with TLS.
User Authentication and Authorization
User authentication is externalized to Google Cloud IAM services. Trifacta fully relies on and inherits from Google Cloud security for any authentication management. Trifacta never accesses or stores customer passwords.
Data authorization to Google Cloud sources or destinations such as Cloud Storage, BigQuery or Google Sheets, is managed by Google authorization services. Google allows customers to determine how these authorizations are defined at the Dataprep service level or at the userspecific level leveraging IAM and OAuth 2.0.
If the customer is accessing other data sources such as applications and databases, the customer must create a connection in the Trifacta Cloud user interface with the proper credentials. These credentials are stored in Google Cloud SQL database and are encrypted using AES-256.
IAM authentication and authorization rule Dataprep access via the Google Cloud Console, APIs and the Google Cloud Command Line Interface (CLI) to ensure all the access points are verified.
Google Cloud Dataprep Sign Up Process
The Dataprep service only exists within the Google Cloud ecosystem, and can only be activated and launched from the Google Cloud Console after authorization through Google Cloud Identity & Access Management (IAM).
During the Dataprep sign-up process from the Google Cloud Console or in the Google Cloud Marketplace, the customer needs to agree on the terms and access authorizations with Google Alphabet, Google Cloud, and Trifacta to let Dataprep service operate.
Agree for Google Cloud to share the customer account information with Trifacta. This is the standard Google Cloud practice to allow a Google Cloud customer to use partner integrated services with Google Cloud. This authorization is necessary for technical support purposes, sales attribution for billing via the Google Cloud services, and product updates communications. Account information is limited to email contact in those specific circumstances.
Allow Dataprep service to access your Google Cloud project data. This is necessary to enable the Dataprep service to seamlessly perform the data transformation instructions authored by the user and on behalf of the user. Dataprep runs and instructs Google Cloud Dataflow jobs on behalf of the user within the project. Roles and permissions are defined with Google Cloud IAM as documented here.
Trifacta takes the security of its customers’ data very seriously. Trifacta follows rigorous processes and controls to assure security, availability, processing integrity, confidentiality, and privacy of customer data. Taking steps to ensure our platform remains secure is vital to protecting our data as well as our customers’ information. This is our highest priority.
The Trifacta platform is built with ease of use, performance, reliability, and security at its core to protect your most valuable asset.