This reference identifies the messages that can appear in the System Health > Overview > Critical Alerts and in the Alerts dashboard.
Informational alerts
TASK_TERMINATED
Msg: Task {{.Service}}.{{.Task}} terminated on machine {{.Machine}}
Type: INFO
This alert is raised when a task terminates.
DISK_ERROR
Msg: Machine {{.Machine}} has disk errors
Type: INFO
Raised when a machine has disk errors.
ZK_AVG_LATENCY
Msg: Average Zookeeper latency is more than {{.Num}} msec
Type: INFO
Raised when average Zookeeper latency is above a threshold.
ZK_MAX_LATENCY
Msg: Max Zookeeper latency is more than {{.Num}} msec
Type: INFO
Raised when max Zookeeper latency is above a threshold.
ZK_MIN_LATENCY
Msg: Min Zookeeper latency is more than {{.Num}} msec
Type: INFO
Raised when min Zookeeper latency is above a threshold.
ZK_OUTSTANDING_REQUESTS
Msg: Number of outstanding Zookeeper requests exceeds {{.Num}}
Type: INFO
Raised when there are too many outstanding Zookeeper requests.
ZK_NUM_WATCHERS
Msg: Number of Zookeeper watchers exceeds {{.Num}}
Type: INFO
Raised when there are too many Zookeeper watchers.
MASTER_ELECTION
Msg: {{.Machine}} elected as Orion Master
Type: INFO
Raised when a new Orion Master is elected.
PERIODIC_BACKUP
Msg: {{.Process}} periodic backup for policy {{.Name}} failed.
Type: INFO
Raised when periodic backup fails.
PERIODIC_SNAPSHOT
Msg: {{.Process}} periodic snapshot {{.Name}} failed.
Type: INFO
Raised when a periodic snapshot fails.
HDFS_CORRUPTION
Msg: HDFS root directory is in a corrupted state.
Type: INFO
Raised when HDFS root directory is corrupted.
APPLICATION_INVALID_STATE
Msg: {{.Service}}.{{.Task}} on {{.Machine}} at location {{.Location}}
Type: INFO
Raised when Application raises invalid state alert.
UPDATE_START
Msg: Starting update of ThoughtSpot cluster {{.Cluster}}
Type: INFO
Raised when update starts.
UPDATE_END
Msg: Finished update of ThoughtSpot cluster {{.Cluster}} to release {{.Release}}
Type: INFO
Raised when update completes.
Errors
TIMELY_JOB_RUN_ERROR
Msg: Job run {{.Message}}
Type: ERROR
Raised when a job run fails.
TIMELY_ERROR
Msg: Job manager {{.Message}}
Type: ERROR
Raised when a job manager runs into an inconsistent state.
Warnings
DISK_SPACE
Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free
Type: WARNING
Raised when a disk is low on available disk space. Valid only in the 3.2 version of ThoughtSpot.
ROOT_DISK_SPACE
Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free on root partition
Type: WARNING
Raised when a machine is low on available disk space on root partition.
BOOT_DISK_SPACE
Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free on boot partition
Type: WARNING
Raised when a machine is low on available disk space on boot partition.
UPDATE_DISK_SPACE
Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free on update partition
Type: WARNING
Raised when a machine is low on available disk space on update partition.
EXPORT_DISK_SPACE
Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free on export partition
Type: WARNING
Raised when a machine is low on available disk space on export partition.
HDFS_NAMENODE_DISK_SPACE
Msg: Machine {{.Machine}} has less than {{.Perc}}% disk space free on HDFS namenode drive
Type: WARNING
Raised when a machine is low on available disk space on HDFS namenode drive.
MEMORY
Msg: Machine {{.Machine}} has less than {{.Perc}}% memory free
Type: WARNING
Raised when a machine is low on free memory.
OS_USERS
Msg: Machine {{.Machine}} has more than {{.Num}} logged in users
Type: WARNING
Raised when a machine has too many users logged in.
OS_PROCS
Msg: Machine {{.Machine}} has more than {{.Num}} processes
Type: WARNING
Raised when a machine has more too many processes.
SSH
Msg: Machine {{.Machine}} doesn't have an active SSH server
Type: WARNING
Raised when a machine has more than 600 processes.
DISK_ERROR_EXTERNAL
Msg: Machine {{.Machine}} has disk errors
Type: WARNING
Raised when more than 2 disk errors happen in a day.
ZK_FD_COUNT
Msg: Zookeeper has more than {{.Num}} open file descriptors
Type: WARNING
Raised when there are too many open Zookeeper files.
ZK_EPHEMERAL_COUNT
Msg: Zookeeper has more than {{.Num}} ephemeral files
Type: WARNING
Raised when there are too many Zookeeper ephemeral files.
HOST_DOWN
Msg: {{.Machine}} is down
Type: WARNING
Raised when a host is down.
TASK_UNREACHABLE
Msg: {{.ServiceDesc}} on {{.Machine}} is unreachable over HTTP
Type: WARNING
Raised when a task is unreachable over HTTP.
TASK_NOT_RUNNING
Msg: {{.ServiceDesc}} is not running
Type: WARNING
Raised when a service task is not running on any machine in the cluster.
Critical alerts
TASK_FLAPPING
Msg: Task {{.Service}}.{{.Task}} terminated {{._actual_num_occurrences}} times in last {{._earliest_duration_str}}
Type: CRITICAL
This alert is raised when a task is crashing repeatedly. The service is evaluted across the whole cluster. So, if a service crashes 5 times in a day across all nodes in the cluster, this alert is generated.
OREO_TERMINATED
Msg: Oreo terminated on machine {{.Machine}}
Type: CRITICAL
This alert is raised when the Oreo daemon on a machine terminates due to an error. This typically happens due to an error accessing Zookeeper, HDFS, or a hardware issue.
HDFS_DISK_SPACE
Msg: HDFS has less than {{.Perc}}% space free
Type: CRITICAL
Raised when a HDFS cluster is low on total available disk space.
ZK_INACCESSIBLE
Msg: Zookeeper is not accessible
Type: CRITICAL
Raised when Zookeeper is inaccessible.
PERIODIC_BACKUP_FLAPPING
Msg: Periodic backup failed {{._actual_num_occurrences}} times in last {{._earliest_duration_str}}
Type: CRITICAL
This alert is raised when a periodic backup failed repeatedly.
PERIODIC_SNAPSHOT_FLAPPING
Msg: Periodic snapshot failed {{._actual_num_occurrences}} times in last {{._earliest_duration_str}}
Type: CRITICAL
This alert is raised when periodic snapshot failed repeatedly.
APPLICATION_INVALID_STATE_EXTERNAL
Msg: {{.Service}}.{{.Task}} on {{.Machine}} at location {{.Location}}
Type: CRITICAL
Raised when Application raises invalid state alert.