Learn about the fields used to create an HDFS connection with ThoughtSpot DataFlow.
Here is a list of the fields for an HDFS connection in ThoughtSpot DataFlow. You need specific information to establish a seamless and secure connection.
Connection properties
- Connection name
- Name your connection.
- Mandatory field.
- Example:
HDFSConnection - Connection type
- Choose the Google BigQuery connection type.
- Mandatory field.
- Example:
HDFS - User
- Specify the user to connect to HDFS file system. This user must have data access privileges.
- Mandatory field.
For Hive security with simple, LDAP, and SSL authentication only. - Example:
user1 - Hadoop distribution
- Provide the distribution of Hadoop being connected to
- Mandatory field.
- Example:
Hortonworks - Valid Values:
CDH, Hortonworks, EMR - Default:
CDH - Distribution version
- Provide the version of the Distribution chosen above
- Mandatory field.
- Example:
2.6.5 - Valid Values:
Valid distribution number of the Hadoop distribution - Default:
6.3.x - Hadoop conf path
- By default, the system picks the Hadoop configuration files from the HDFS. To override, specify an alternate location. Applies only when using configuration settings that are different from global Hadoop instance settings.
- Mandatory field.
- Example:
/app/path - Other notes:
An instance where this could be needed is, if the hdfs is encrypted and the location of key files and password decrypt the files is available in the hadoop config files. - HDFS HA configured
- Enables High Availability for HDFS
- Optional field.
- HDFS name service
- The logical name of given to HDFS nameservice.
- Mandatory field.
For HDFS HA only. - Example:
lahdfs - Other notes:
It is available inhdfs-site.xml
and defined asdfs.nameservices
. - HDFS name node IDs
- Provides the list of NameNode IDs separted by comma and DataNodes use this property to determine all the NameNodes in the cluster.
XML property name is
dfs.ha.namenodes.dfs.nameservices
. - Mandatory field.
For HDFS HA only. - Example:
nn1,nn2 - RPC address for namenode1
- To specify the fully-qualified RPC address for each listed NameNode and defined as
dfs.namenodes.rpc-address.dfs.nameservices.name_node_ID_1>
. - Mandatory field.
For HDFS HA only. - Example:
www.example1.com:1234 - RPC address for namenode2
- To specify the fully-qualified RPC address for each listed NameNode and defined as
dfs.namenode.rpc-address.dfs.nameservices.name_node_ID_2
. - Mandatory field.
For HDFS HA only. - Example:
www.example2.com:1234 - DFS host
- Specify the DFS hostname or the IP address
- Mandatory field.
For when not using HDFS HA. - DFS port
- Speciffy the associated DFS port
- Mandatory field.
For when not using HDFS HA. - Default HDFS location
- Specify the location for the default source/target location
- Mandatory field.
- Example:
/tmp - Temp HDFS location
- Specify the location for creating temp directory
- Mandatory field.
- Example:
/tmp - HDFS security authentication
- Select the type of security being enabled
- Mandatory field.
- Example:
Kerberos - Valid Values:
Simple, Kerberos - Default:
simple - Hadoop RPC protection
- Hadoop cluster administrators control the quality of protection using the configuration parameter hadoop.rpc.protection
- Mandatory field.
For DFS security authentication with Kerberos only. - Example:
none - Valid Values:
None, authentication, integrity, privacy - Default:
authentication - Other notes:
It is available incore-site.xml
. - Hive principal
- Principal for authenticating hive services
- Mandatory field.
- Example:
hive/host@name.example.com - Other notes:
It is available inhive-site.xml
. - User principal
- To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab ( Configured while enabling Kerberos)
- Mandatory field.
- Example:
labuser@name.example.com - User keytab
- To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab ( Configured while enabling Kerberos)
- Mandatory field.
- Example:
/app/keytabs/labuser.keytab - KDC host
- Specify KDC Host Name where as KDC (Kerberos Key Distribution Center) is a service than runs on a domain controller server role (Configured from Kerbores configuration-/etc/krb5.conf )
- Mandatory field.
- Example:
kdc_host@example.com - Default realm
- A Kerberos realm is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service (Configured from Kerbores configuration-/etc/krb5.conf )
- Mandatory field.
- Example:
name.example.com
Sync properties
- Column delimiter
- Specify the column delimiter character.
- Mandatory field.
- Example:
1 - Valid Values:
Any ASCII character - Default:
ASCII 01 (SOH) - Enable archive on success
- Specify if data needs to be archived once it is succeeded
- Optional field.
- Example:
No - Valid Values:
Yes - Default:
No - Delete on success
- Specify if data needs to be deleted after execution is successful
- Optional field.
- Example:
No - Valid Values:
Yes - Default:
No - Compression
- Specify this if the file is compressed and what kind of compressed file it is
- Mandatory field.
- Example:
gzip - Valid Values:
None, gzip - Default:
None - Enclosing character
- Specify if the text columns in the source data needs to be enclosed in quotes.
- Optional field.
- Example:
Single - Valid Values:
Single, Double, Empty - Default:
Double - Escape character
- Specify the escape character if using a text qualifier in the source data.
- Optional field.
- Example:
\\ - Valid Values:
Any ASCII character - Default:
Empty - Null value
- Specify the string literal that represents NULL values in data. During the data load, the column value that matches this string loads as NULL into ThoughtSpot.
- Optional field.
- Example:
NULL - Valid Values:
NULL - Default:
NULL - Date style
- Specifies how to interpret the date format
- Optional field.
- Example:
YMD - Valid Values:
YMD, MDY, DMY, DMONY, MONDY, Y2MD, MDY2, DMY2, DMONY2, MONDY2 - Default:
YMD - Date delimiter
- Specifies the separator used in the date format ( only default delimiter is supported).
- Optional field.
- Example:
- - Valid Values:
Any printable ASCII character - Default:
- - Time style
- Specifies the format of the time portion in the data.
- Optional field.
- Example:
24HOUR - Valid Values:
12 HOUR - Time delimiter
- Specifies the character used as separate the time components. (Only default delimiter is supported)
- Optional field.
- Example:
: - Valid Values:
Any printable ASCII character - Default:
: - TS load options
- Specify additional parameters passed with the
tsload
command. The format for these parameters is:--<param_1_name> <optional_param_1_value>
- Optional field.
- Example:
--max_ignored_rows 0
- Valid Values:
--null_value ""
--escape_character ""
--max_ignored_rows 0
- Default:
--max_ignored_rows 0
- Reference:
tsload flag reference