Learn about the fields used to create a Hive connection with ThoughtSpot DataFlow.
Here is a list of the fields for a Hive connection in ThoughtSpot DataFlow. You need specific information to establish a seamless and secure connection.
Connection properties
- Connection name
- Name your connection.
- Mandatory field.
- Example:
HiveConnection - Connection type
- Choose the Hive connection type.
- Mandatory field.
- Example:
Hive - HiveServer2 HA configured
- Specify this option if using HiveServer2 High Availability.
- Mandatory field.
- HiveServer2 zookeeper namespace
- Specify zookeeper namespace as hivesever2. This is the default value.
- Mandatory field.
Only when using Hiveserver2 HA. - Example:
hiveserver2 - Other notes:
If value is different then, the value can be found from hive-site.xml against the propertyhive.server2.zookeeper.namespace
. - Host
- Specify the hostname or the IP address of the Hadoop system
- Mandatory field.
Only when not using Hiveserver2 HA. - Example:
myEmail@example.com - Port
- Specify the port.
- Mandatory field.
Only when not using Hiveserver2 HA. - Example:
1234 - Hive security authentication
- Specifies the type of security protocol to connect to the instance. Based on the type of security select the authentication type and provide details.
- Mandatory field.
- Example:
Kerberos - Valid Values:
Simple, Kerberos, LDAP, SSL, Kerberos & SSL, LDAP & SSL - Default:
Simple - Other notes:
The authentication type setup for the instance can be found from hive-site.xml against the propertyhive.server2.authentication
. - User
- Specify the user to connect to Hive. This user must have data access privileges.
- Mandatory field.
For simple, LDAP, and Simple authentication only. - Example:
userdi - Default:
simple - Password
- Specify the password.
- Optional field.
For simple, LDAP authentication only. - Example:
pswrd234%! - Trust store
- Specify the trust store name for authentication
- Mandatory field.
For SSL and Kerberos & SSL authentication only. - Example:
trust store - Default:
SSL - Trust store password
- Specify the password for the trust store
- Mandatory field.
For SSL and Kerberos & SSL authentication only. - Example:
password - Default:
SSL - Hive transport mode
- Applicable only for hive process engine. This specifies the network protocol used for communicating between hive nodes.
- Mandatory field.
- Example:
binary - Valid Values:
Binary, HTTP - Default:
binary - Other notes:
The Hive transport mode can be identified from hive-site.xml against the property hive.server2.transport.mode. - HTTP path
- This is specified as an option when http transport mode is selected
- Mandatory field.
For HTTP transport mode only. - Example:
cliservice - Valid Values:
cliservice - Default:
cliservice - Other notes:
The HTTP Path value can be identified fromhive-site.xml
against the propertyhive.server2.thrift.http.path
. - Hadoop distribution
- Provide the distribution of Hadoop being connected to
- Mandatory field.
- Example:
Hortonworks - Valid Values:
CDH, Hortonworks, EMR - Default:
CDH - Distribution version
- Provide the version of the Distribution chosen above
- Mandatory field.
- Example:
2.6.5 - Valid Values:
Any Numeric value - Default:
6.3.x - Hadoop conf path
- By default, the system picks the Hadoop configuration files from the HDFS. To override, specify an alternate location. Applies only when using configuration settings that are different from global Hadoop instance settings.
- Mandatory field.
- Example:
$DI_HOME/app/path - Other notes:
An instance where this could be needed is, if the hdfs is encrypted and the location of key files and password decrypt the files is available in the hadoop config files. - DFS HA configured
- Specify if using High Availability for DFS.
- Optional field.
For Hadoop Extract only. - Example:
Checked - DFS name service
- Specify the logical name of the HDFS nameservice.
- Mandatory field.
For DFS HA and Hadoop Extract only. - Example:
lahdfs - Other notes:
It is available in hdfs-site.xml and defined as dfs.nameservices - DFS name node IDs
- Specify a comma-separated list of NameNode IDs. System uses this property to determine all NameNodes in the cluster. XML property name is
dfs.ha.namenodes.dfs.nameservices
. - Mandatory field.
For DFS HA and Hadoop Extract only. - Example:
nn1, nn2 - RPC address for namenode1
- Specify the fully-qualified RPC address for each listed NameNode. Defined as
dfs.namenode.rpc-address.dfs.nameservices.name node ID 1
. - Mandatory field.
For DFS HA and Hadoop Extract only. - Example:
lclabh.example.com:5678 - RPC address for namenode2
- Specify the fully-qualified RPC address for each listed NameNode. Define as
dfs.namenode.rpc-address.dfs.nameservices.name node ID 2
. - Mandatory field.
For DFS HA and Hadoop Extract only. - Example:
lvclabh.example.com:9876 - DFS host
- Specify the DFS hostname or the IP address
- Mandatory field.
For Hadoop Extract only, when not using DFS HA. - Example:
myemail@example.com - DFS port
- Specify the associated DFS port
- Mandatory field.
For Hadoop Extract only, when not using DFS HA. - Example:
1234 - Default DFS location
- Specify the location for the default source/target location
- Mandatory field.
For Hadoop Extract only. - Example:
/tmp - Temp DFS location
- Specify the location for creating temp directory
- Mandatory field.
For Hadoop Extract only. - Example:
/tmp - DFS security authentication
- Select the type of security being enabled
- Mandatory field.
For Hadoop Extract only. - Example:
Kerberos - Valid Values:
Simple, Kerberos - Default:
simple - Hadoop RPC protection
- Hadoop cluster administrators control the quality of protection using the configuration parameter
hadoop.rpc.protection
. - Mandatory field.
When using Kerberos DFS security authentication and Hadoop Extract. - Example:
none - Valid Values:
None, authentication, integrity, privacy - Default:
authentication - Other notes:
It is available in core-site.xml. - Hive principal
- Principal for authenticating hive services
- Mandatory field.
- Example:
hive/host@lab.example.com - Other notes:
It is available in hive-site.xml - User principal
- To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab ( Configured while enabling Kerberos)
- Mandatory field.
- Example:
labuser@labdp.example.com - User keytab
- To authenticate via a key-tab you must have supporting key-tab file which is generated by Kerberos Admin and also requires the user principal associated with Key-tab ( Configured while enabling Kerberos)
- Mandatory field.
- Example:
/app/keytabs/labuser.keytab - KDC host
- Specify KDC Host Name where as KDC (Kerberos Key Distribution Center) is a service than runs on a domain controller server role (Configured from Kerbores configuration-/etc/krb5.conf )
- Mandatory field.
- Example:
example.example.com - Default realm
- A Kerberos realm is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service (Configured from Kerbores configuration-/etc/krb5.conf )
- Mandatory field.
- Example:
labhdp.example.com - Queue name
- Specify the queue name followed by a coma separated form in yarn.scheduler.capacity.root.queues.
- Mandatory field.
For Hadoop Extract only. - Example:
default - Other notes:
It is available in capacity-scheduler.xml - YARN web UI port
- Yarn Providing web UI for yarn RM and by default 8088 in use
- Mandatory field.
For Hadoop Extract only. - Example:
8088 - Zookeeper quorum host
- Specify the value of hadoop.registry.zk.quorum from yarn-site.xml
- Mandatory field.
Only when not using Hiveserver2 HA. - Example:
lvclhdp1.example.com:21,lvclabhdp12.example.com:81,lvclabhdp12.example.com:2093 - Yarn timeline webapp host
- Specify the ip adress of yarn timeline service web application
- Mandatory field.
- Example:
8188 - Yarn timeline webapp port
- Specify the port associated with the yarn timeline service web application
- Mandatory field.
- Example:
8190 - Yarn timeline webapp version
- Specify the version associated with the yarn timeline service web application
- Mandatory field.
- Example:
v1 - JDBC options
- Specify the options associated with the JDBC URL.
- Optional field.
- Example:
jdbc:sqlserver://[serverName[\instanceName][:portNumber]]
Sync properties
- Data extraction mode
- Specify the extraction type.
- Mandatory field.
- Example:
Hadoop Extract - Valid Values:
Hadoop Extract, JDBC - Default:
Hadoop Extract - Null value
- Specifies the string literal that should indicate the null value in the extracted data. During the data load the column value matching this string will be loaded as null in the target.
- Mandatory field.
For Hadoop Extract only. - Example:
NULL - Valid Values:
NULL - Default:
NULL - Enclosing character
- Specify if the text columns in the source data needs to be enclosed in quotes.
- Mandatory field.
- Example:
DOUBLE - Valid Values:
SINGLE, DOUBLE - Default:
DOUBLE - Escape character
- Specify the escape character if using a text qualifier in the source data.
- Mandatory field.
- Example:
\" - Valid Values:
\\, Any ASCII character - Default:
\" - TS load options
- Specifies the parameters passed with the
tsload
command, in addition to the commands already included by the application. The format for these parameters is:--<param_1_name> <optional_param_1_value>
--<param_2_name> <optional_param_2_value>
- Optional field.
- Example:
--max_ignored_rows 0 - Valid Values:
--null_value "
--escape_character "
--max_ignored_rows 0
- Default:
--max_ignored_rows 0