Skip to main content

Databricks Connections

Create a connection to your Databricks workspace from Alteryx Analytics Cloud (AAC). Databricks offers a unified data analytics platform for data scientists, engineers, and business analysts. It provides a multi-cloud, managed offering for data processing, machine learning, and data governance.

This page provides high-level information on how to configure AAC to integrate with Databricks hosted on AWS or Azure. Databricks is a unified data analytics platform that is optimized for use on the AWS or Azure infrastructure.

Prerequisites

  1. For AWS Databricks, configure your base storage environment to S3 and disable ADS before setting up Databricks workspaces. Go to AWS as Private Data Storage to learn more.

  2. For Azure Databricks, configure your base storage environment to ADLS and disable ADS before setting up Databricks workspaces. Go to ADLS as Private Data Storage to learn more.

  3. Have a Databricks E2 workspace.

  4. Obtain a Personal Access Token (PAT) from Databricks. Learn how to generate your PAT.

  5. Have an admin configure the Databricks workspace in AAC. Learn how to configure a Databricks workspace.

  6. Connect AAC to your Databricks workspace. Learn how to connect your Databricks workspace.

Limitations

General

  • AAC supports managed tables and external delta tables from Databricks. For external delta tables, AAConly supports append and truncate actions.

  • You can't query materialized views with job clusters.

  • Due to a Databricks limitation, only use date values after October 1, 1582. Date values before October 1, 1582 result in an error.

SQL Pushdown

  • Typecasting issues during integer division expression. The output becomes Float even if you choose Integer for the output column data type. For example, the output of the expression [total_weight]/[quantity], where both columns are Integer data types, becomes a Float data type.

  • Precision error with float data types. For example, the expression [weight]-1 => 1.099999 in Databricks SQL where {int weight=2.1}. The same expression using Photon as the engine results in 1.1.

  • There is no specific way to order rows except when you specify the order_by clause. This affects the Row ID Tool and Union Tool.

  • DateTime output ignores the millisecond values of the input DateTime value. For example, 2021-12-31 23:59:59.999 → 2021-12-31 23:59:59.000. Import jobs also ignore millisecond values when creating a dataset.

  • The substring function might give incorrect results when the value position and length are very large.

  • Precision issues with Double and Float data types. For example:

    select  687399551400673279 + 5.35
    returns 687399551400673284.35
    
    select  687399551400673279 + double(5.35)
    returns 687399551400673300
    
    select  double(687399551400673279) + double(5.35)
    returns 687399551400673300

    Note

    Float is a base-2 numeric type. When given a literal which is base-10 the representation might not be exact.

    Double is a base-2 numeric type. When given a literal which is base-10 the representation might not be exact.

    Use Decimal data types to accurately represent fractional or large base-10 numbers.

  • The exp function returns infinity in cases with very large Float values.

  • The rpad function produces empty results as opposed to null. For example:

    SELECT rpad('hi', 500000000000000000000000000000000, 'ab');
    returns ""

Create and Configure a Databricks Connection

  1. On the AAC landing page, select Connections from the global header.

  2. Select Create Connection at the top of the page.

  3. Search for Databricks and then select the Databricks connection.

  4. Enter a unique Connection name and an optional Connection description.

  5. Select an available Databricks workspace under Workspace Name.

    1. The Personal Access Token for the selected workspace automatically populates.

  6. (Optional) Enter Connect String Options. For details on this option, go to the later section on this page.

  7. Select Test Connection to check if your workspace connects correctly.

  8. Select Create.

You can now access your Databricks tables from the Data page.

Connect String Options

The connect string options are optional. If you are passing additional properties and values to complete the connection, you must structure the connect string options in this way:

;<prop1>=<val1>;<prop2>=<val2>... 

Where:

  • <prop>: The name of the property.

  • <val>: The value for the property.

Delimiters:

  • ;: Any set of connect string options must begin and end with a semi-colon.

    • You can omit a semicolon from the end of the connect string options.

  • =: You must separate property names and values with an equal sign (=).