Skip to content

Databricks

Databricks is a cloud-based data platform that helps companies manage and analyze large amounts of data from various sources.

Databricks was originally created as a way to easily run Apache Spark, a powerful open-source data processing engine, without having to worry about the underlying infrastructure. It provided a user-friendly "notebook" interface where you could write code and run it on a scalable, distributed computing cluster in the cloud.

Databricks parameters

Required parameters are in red and Optional parameters are in blue.


  • server_hostname, your SQL Warehouse's host name
    this can be found on the Connection Details tab

  • http_path, your SQL Warehouse's path
    this can be found on the Connection Details tab

  • access_token, generate a personal access token from your SQL Warehouse
    this can be generated on the Connection Details tab

  • catalog, the catalog to write new data to
    if tables do not exist in the catalog.schema location already, we'll auto-create them

  • schema, the schema to write new data to
    if tables do not exist in the database.schema location already, we'll auto-create them

  • port, the port number where your Databricks instance is exposed on
    default: 443

  • load_strategy, how to write new data into existing tables
    default: APPEND ( allowed: APPEND, TRUNCATE, UPSERT )
How do I use the Databricks syncer in commands?

cs_tools tools searchable bi-server --syncer "databricks://server_hostname=...&http_path=...&access_token=...&catalog=..."

- or -

cs_tools tools searchable bi-server --syncer databricks://definition.toml

Definition TOML Example

definition.toml

[configuration]
server_hostname = "..."
http_path = "..."
access_token = "..."
catalog = "..."
schema = 'CS_TOOLS'
load_strategy = 'truncate'