ThoughtSpot Software
Release: 6.3
Switch to version 7.0 (latest)
About this Release
Users
Administration
Mobile
Embedding
Deploy
Embrace
DataFlow
- Overview
- Key features
- How DataFlow works
- DataFlow home page
- Requirements and guidelines
- Database Connections
  - Supported database connections
  - Amazon Aurora
    - Overview
    - Connect
    - Sync
    - Reference
  - Amazon Redshift
    - Overview
    - Connect
    - Sync
    - Reference
  - Azure Synapse
    - Overview
    - Connect
    - Sync
    - Reference
  - Cassandra
    - Overview
    - Connect
    - Sync
    - Reference
  - Databricks Delta Lake
    - Overview
    - Connect
    - Sync
    - Reference
  - Denodo
    - Overview
    - Connect
    - Sync
    - Reference
  - Google BigQuery
    - Overview
    - Connect
    - Sync
    - Reference
  - Hive
    - Overview
    - Connect
    - Sync
    - Reference
  - IBM Db2
    - Overview
    - Connect
    - Sync
    - Reference
  - JDBC
    - Overview
    - Connect
    - Sync
    - Reference
  - MariaDB
    - Overview
    - Connect
    - Sync
    - Reference
  - MongoDB
    - Overview
    - Connect
    - Sync
    - Reference
  - MySQL
    - Overview
    - Connect
    - Sync
    - Reference
  - Netezza
    - Overview
    - Connect
    - Sync
    - Reference
  - Oracle
    - Overview
    - Connect
    - Sync
    - Reference
  - PostgreSQL
    - Overview
    - Connect
    - Sync
    - Reference
  - Presto
    - Overview
    - Connect
    - Sync
    - Reference
  - SAP Adaptive Server Enterprise
    - Overview
    - Connect
    - Sync
    - Reference
  - SAP HANA
    - Overview
    - Connect
    - Sync
    - Reference
  - SAP SQL Anywhere
    - Overview
    - Connect
    - Sync
    - Reference
  - SAS
    - Overview
    - Connect
    - Sync
    - Reference
  - SQL Server
    - Overview
    - Connect
    - Sync
    - Reference
  - Snowflake
    - Overview
    - Connect
    - Sync
    - Reference
  - Splice Machine
    - Overview
    - Connect
    - Sync
    - Reference
  - Teradata
    - Overview
    - Connect
    - Sync
    - Reference
- File System Connections
  - Supported file system connections
  - Amazon S3
    - Overview
    - Connect
    - Sync
    - Reference
  - Apache Parquet
    - Overview
    - Connect
    - Sync
    - Reference
  - Azure Blob Storage
    - Overview
    - Connect
    - Sync
    - Reference
  - Files
    - Overview
    - Connect
    - Sync
    - Reference
  - Google Cloud Storage
    - Overview
    - Connect
    - Sync
    - Reference
  - HDFS
    - Overview
    - Connect
    - Sync
    - Reference
- Application Connections
  - Supported application connections
  - Salesforce
    - Overview
    - Connect
    - Sync
    - Reference
  - REST API
    - Overview
    - Connect
    - Sync
    - Reference
- Administration
Data Integration
Disaster Recovery
Reference
ThoughtSpot in Practice
- Introduction
- Reaggregation in practice

Data Caching

ThoughtSpot does all analysis against data in memory to help achieve fast results across millions and billions of records of data.

ThoughtSpot caches data as relational tables in memory. The tables can be sourced from different data sources and joined together. ThoughtSpot has several approaches for getting data into the cluster.

JDBC and ODBC Drivers

ThoughtSpot provides a JDBC and ODBC driver that can be used to write data to ThoughtSpot. This is useful for customers who already have an existing ETL process or tool, and want to extend it to populate the ThoughtSpot cache.

JDBC and ODBC drivers are appropriate under the following circumstances:

have an ETL load, such as Informatica, SSIS, and so on
have available resources to create and manage ETL
have smaller daily loads

tsload

You can use the tsload command line tool to bulk load delimited data with very high throughput. Finally, individual users can upload smaller (< 50MB) spreadsheets or delimited files.

We recommend the tsload approach in the following cases:

initial data load
JDBC or ODBC drivers are not available
there are large recurring daily loads
for higher throughput; this can add I/O costs

Choosing a Data Caching Strategy

The approach you choose depends on your environment and data needs. There are, of course, tradeoffs between different data caching options.

Many implementations use a variety of approaches. For example, a solution with a large amount of initial data and smaller daily increments might use tsload to load the initial data, and then use the JDBC driver with an ETL tool for incremental loads.