Learn about the fields used to create an HDFS connection with ThoughtSpot DataFlow.

Here is a list of the fields for an HDFS connection in ThoughtSpot DataFlow. You need specific information to establish a seamless and secure connection.

Connection properties

Connection name
Name your connection.
Mandatory field.
Example:
HDFSConnection
Connection type
Choose the Google BigQuery connection type.
Mandatory field.
Example:
HDFS
User
Specify the user to connect to HDFS file system. This user must have data access privileges.
Mandatory field.
For Hive security with simple, LDAP, and SSL authentication only.
Example:
user1
Hadoop distribution
Provide the distribution of Hadoop being connected to
Mandatory field.
Example:
Hortonworks
Valid Values:
CDH, Hortonworks, EMR
Default:
CDH
Distribution version
Provide the version of the Distribution chosen above
Mandatory field.
Example:
2.6.5
Valid Values:
Valid distribution number of the Hadoop distribution
Default:
6.3.x
Hadoop conf path
By default, the system picks the Hadoop configuration files from the HDFS. To override, specify an alternate location. Applies only when using configuration settings that are different from global Hadoop instance settings.
Mandatory field.
Example:
/app/path
Other notes:
An instance where this could be needed is, if the hdfs is encrypted and the location of key files and password decrypt the files is available in the hadoop config files.
HDFS HA configured
Enables High Availability for HDFS
Optional field.
HDFS name service
The logical name of given to HDFS nameservice.
Mandatory field.
For HDFS HA only.
Example:
lahdfs
Other notes:
It is available in hdfs-site.xml and defined as dfs.nameservices.
HDFS name node IDs
Provides the list of NameNode IDs separted by comma and DataNodes use this property to determine all the NameNodes in the cluster. XML property name is dfs.ha.namenodes.dfs.nameservices.
Mandatory field.
For HDFS HA only.
Example:
nn1,nn2
RPC address for namenode1
To specify the fully-qualified RPC address for each listed NameNode and defined as dfs.namenodes.rpc-address.dfs.nameservices.name_node_ID_1>.
Mandatory field.
For HDFS HA only.
Example:
www.example1.com:1234
RPC address for namenode2
To specify the fully-qualified RPC address for each listed NameNode and defined as dfs.namenode.rpc-address.dfs.nameservices.name_node_ID_2.
Mandatory field.
For HDFS HA only.
Example:
www.example2.com:1234
DFS host
Specify the DFS hostname or the IP address
Mandatory field.
For when not using HDFS HA.
DFS port
Speciffy the associated DFS port
Mandatory field.
For when not using HDFS HA.
Default HDFS location
Specify the location for the default source/target location
Mandatory field.
Example:
/tmp
Temp HDFS location
Specify the location for creating temp directory
Mandatory field.
Example:
/tmp
HDFS security authentication
Select the type of security being enabled
Mandatory field.
Example:
Kerberos
Valid Values:
Simple, Kerberos
Default:
simple
Hadoop RPC protection
Hadoop cluster administrators control the quality of protection using the configuration parameter hadoop.rpc.protection
Mandatory field.
For DFS security authentication with Kerberos only.
Example:
none
Valid Values:
None, authentication, integrity, privacy
Default:
authentication
Other notes:
It is available in core-site.xml.
Hive principal
Principal for authenticating hive services
Mandatory field.
Example:
hive/host@name.example.com
Other notes:
It is available in hive-site.xml.
User principal
To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab ( Configured while enabling Kerberos)
Mandatory field.
Example:
labuser@name.example.com
User keytab
To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab ( Configured while enabling Kerberos)
Mandatory field.
Example:
/app/keytabs/labuser.keytab
KDC host
Specify KDC Host Name where as KDC (Kerberos Key Distribution Center) is a service than runs on a domain controller server role (Configured from Kerbores configuration-/etc/krb5.conf )
Mandatory field.
Example:
kdc_host@example.com
Default realm
A Kerberos realm is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service (Configured from Kerbores configuration-/etc/krb5.conf )
Mandatory field.
Example:
name.example.com

Sync properties

Column delimiter
Specify the column delimiter character.
Mandatory field.
Example:
1
Valid Values:
Any ASCII character
Default:
ASCII 01 (SOH)
Enable archive on success
Specify if data needs to be archived once it is succeeded
Optional field.
Example:
No
Valid Values:
Yes
Default:
No
Delete on success
Specify if data needs to be deleted after execution is successful
Optional field.
Example:
No
Valid Values:
Yes
Default:
No
Compression
Specify this if the file is compressed and what kind of compressed file it is
Mandatory field.
Example:
gzip
Valid Values:
None, gzip
Default:
None
Enclosing character
Specify if the text columns in the source data needs to be enclosed in quotes.
Optional field.
Example:
Single
Valid Values:
Single, Double, Empty
Default:
Double
Escape character
Specify the escape character if using a text qualifier in the source data.
Optional field.
Example:
\\
Valid Values:
Any ASCII character
Default:
Empty
Null value
Specify the string literal that represents NULL values in data. During the data load, the column value that matches this string loads as NULL into ThoughtSpot.
Optional field.
Example:
NULL
Valid Values:
NULL
Default:
NULL
Date style
Specifies how to interpret the date format
Optional field.
Example:
YMD
Valid Values:
YMD, MDY, DMY, DMONY, MONDY, Y2MD, MDY2, DMY2, DMONY2, MONDY2
Default:
YMD
Date delimiter
Specifies the separator used in the date format ( only default delimiter is supported).
Optional field.
Example:
-
Valid Values:
Any printable ASCII character
Default:
-
Time style
Specifies the format of the time portion in the data.
Optional field.
Example:
24HOUR
Valid Values:
12 HOUR
Time delimiter
Specifies the character used as separate the time components. (Only default delimiter is supported)
Optional field.
Example:
:
Valid Values:
Any printable ASCII character
Default:
:
TS load options
Specify additional parameters passed with the tsload command. The format for these parameters is:
--<param_1_name> <optional_param_1_value>
Optional field.
Example:
--max_ignored_rows 0
Valid Values:

--null_value ""
--escape_character ""
--max_ignored_rows 0
Default:
--max_ignored_rows 0
Reference:
tsload flag reference

Dataflow tips