This section includes some tips and best practices for using ThoughtSpot Data Connect.
Data sources
Unless you’re trying to bring in relationships between tables with a data source, the recommendation is to create one data source load job per table. Doing this makes your data sources more flexible for if/when you need to change something (add a column, etc.)
If you want to add a foreign key, do it using the TQL command ALTER TABLE, after the load has completed. This is often easier than trying to get relationships to come in with a data load, and it works even if the relationship you want to create does not exist in the source system. If use this method, you’ll need to manually create the relationships using a TQL script, which you can run afterward the first data load.
Derived column names
When you create a derived column, further activities in ThoughtSpot Data Connect will show you the target column name, since there is no analogous column in the source.
Compact tables to save memory
If you’re doing incremental loads into existing tables, you may want to automate a script that runs the TQL command COMPACT TABLE either after each incremental load, or periodically (like weekly, for a nightly load). This is because when upserts are performed, space from the old rows is not released. The delta between each row that gets upserted is retained and takes up memory.
Renaming columns and tables
ThoughtSpot does not support renaming columns using TQL for tables that were imported with Data Connect. You can rename columns and tables after a data load, using the DDL edit step or the ThoughtSpot web application. Select Data and click on the name of the table. Then click on a column to rename it.
In older versions of ThoughtSpot Data Connect, you could not use characters other than alphanumeric and underscores. Now, you can change a column name to use those, for example you could change “percent_gross_margin” to “% gross margin”.
Remember: The use of spaces is not supported. Also, field names must contain 65 characters or less, and they cannot start with a number.
Status reporting
You can see the status of a data load job by looking at the list of data sources. The data load status gets updated every 30 seconds.
To see loading details for each table that makes up a data load, click on the name of the data load.
View log files
You can download the session logs for a data load run. Access the link to download the log on the data load status details page for the session. Check the log files for a successful or failed run, and fix the cause of the failure before resubmitting the job.
Timestamp format
All relevant source values should be in the following format: yyyy-mm-dd hh:mm:ss[.fffffffff]
. In addition, please ensure that year values are within the range of 0000-9999.
Import relationships
Relationships (primary key/foreign key) can be imported from the source database. For this to work, the primary and foreign key(s) need to be made up of only one column each. That is, they cannot be composite keys. You will also need to include all the tables whose relationships you want to bring over into ThoughtSpot within a single data source.
Edit the DDL script to choose a different database, schema, or table
You can edit the SQL DDL (data definition language) script generated by ThoughtSpot Data Connect to point to an existing table in ThoughtSpot. You can then load to that table and/or create the table in a different database/schema than what the default DDL suggests. To do this, the column names and the order of the column names must match exactly with the table you are importing into in ThoughtSpot.
No need to import a column mentioned in an expression
It is not necessary to import a column that is mentioned in an expression. For example, let’s say you have a source table “sales_fact” with a column “revenue”. You may want to use that column as part of an expression that calculates gross margin. But maybe you don’t want to include the column “revenue” in the target table. You can unselect the column “revenue” in the table selection page, and then add your gross margin expression that references the column “revenue”, even though it isn’t part of the data to be loaded. Your expression will still work.
Internet connectivity
The only internet access the needed for ThoughtSpot Data Connect to operate is
outbound access to https://app.informaticaondemand.com
and
https://app2.informaticacloud.com
(port 443). Of course, it also needs any
other internet connectivity required to access any cloud data sources, like
Salesforce. No inbound access is needed.
Troubleshooting
To view details about error conditions, you can look at the trace file that is generated when there is a red bar error.
Possible issues that can occur are:
- Cannot connect to data source.
- Authentication was rejected.
- Schema being loaded does not match schema receiving the load.
Expressions
You can validate your expressions when using ThoughtSpot Data Connect. Make sure to click Validate after entering your expression. This prevents you from entering an invalid expression that will fail at runtime.
When using the expression builder concat
, you need to use single quotes instead of double quotes around each of the strings.
Connection parameters
Informatica provides documentation on each of its connectors. This contains information on the connection properties and what they mean for each connector. You can see the connection properties in ThoughtSpot Data Connect by mousing over the key value pairs, however, you won’t be able to see their meanings. For that, you should refer to the Informatica documentation.