Skip to content

Databricks

There is No Magic!

This Syncer uses Databricks's SQLAlchemy driver under the hood.

Databricks states that it is intended to connect to Unity Catalog, and that usage with hive_metastore is untested.

Parameters

Required parameters are in red and Optional parameters are in blue.


  • server_hostname, your SQL Warehouse's host name
    this can be found on the Connection Details tab

  • http_path, your SQL Warehouse's path
    this can be found on the Connection Details tab

  • access_token, generate a personal access token from your SQL Warehouse
    this can be found on the Connection Details tab

  • catalog, the catalog to write new data to
    if tables do not exist in the catalog.schema location already, we'll auto-create them

  • schema, the schema to write new data to
    if tables do not exist in the database.schema location already, we'll auto-create them

  • port, the port number where your Databricks instance is exposed on
    default: 443

  • use_legacy_dataload, fall back to slower data loading with JDBC-style INSERTs
    default: false ( allowed: true, false )

  • load_strategy, how to write new data into existing tables
    default: APPEND ( allowed: APPEND, TRUNCATE, UPSERT )

Serverless Requirements

If you're running CS Tools    serverless, you'll want to ensure you install these    python requirements.

🧙 Don't know what this means? It's probably safe to ignore it.

How do I use the Syncer in commands?

CS Tools accepts syncer definitions in either declarative or configuration file form.

Find the copy button to the right of the code block.

Simply write the parameters out alongside the command.

cs_tools tools searchable metadata --syncer "databricks://server_hostname=dbc-abc1234-efgh.cloud.databricks.com&http_path=/sql/protocolv1/o/1234567890123456/0123-456789-abcdef01&access_token=dapi0123456789abcdef0123456789abcdef&catalog=thoughtspot" --config dogfood

* when declaring multiple parameters inline, you should wrap the enter value in quotes.

  1. Create a file with the .toml extension.

    syncer-overwrite.toml

    [configuration]
    server_hostname = "dbc-abc1234-efgh.cloud.databricks.com"
    http_path = "/sql/protocolv1/o/1234567890123456/0123-456789-abcdef01"
    access_token = "dapi0123456789abcdef0123456789abcdef"
    catalog = "thoughtspot"
    schema = "cs_tools"
    port = 443
    load_strategy = "TRUNCATE"
    
    * this is a complete example, not all parameters are required.

  2. Write the filename in your command in place of the parameters.

    cs_tools tools searchable metadata --syncer databricks://syncer-overwrite.toml --config dogfood