Cluster Monitoring Application

Overview

The cluster monitor application is a configurable tool that can be used to monitor the performance of StreamBase applications. It consists of a StreamBase monitoring fragment and a Live Datamart server fragment. It discovers services, polls for statistics, and publishes them to dynamically created tables in the server.

LiveView Web support is automatically enabled.

Usage

To install the cluster monitor, pick a node name to install to, in this example: monitor.sb10

$ epadmin install node --nodename=monitor.sb10 \
  --substitutions='NODE_NAME=monitor.sb10' \
  --application=<product-installation>/distrib/tibco/sb/applications/cluster-monitor.zip
$ epadmin --servicename=monitor.sb10 start node

Installation Notes

  • The value for the install --nodename=N option must match the value for the --substitutions='NODE_NAME=N' option.
  • The cluster monitor must be started in its own cluster.
  • The install node --discoveryport=N option may be used to set the discovery port to the appropriate value for the cluster(s) to be monitored.
  • After the node has been started, wait for the LiveView Datamart server to complete initialization. This can be done by watching for the following line to appear in its logfile, monitor.sb10/logs/liveview-server.log:
      *** All tables have been loaded. LiveView is ready to accept client connections.

    The epadmin tail logging command can be used to watch the log file:

      $ epadmin --servicename=monitor.sb10 tail logging --enginename=liveview-server

    See the administration guide for details on epadmin logging target.

LiveView Web

With the default configuration, after the LiveView Datamart server finishes start, the LiveView Web interface may be found here, containing a pre-configured default set of cards.

  • Services - Discovered Services
  • Percentage of Shared Memory In Use - Amount of shared memory in use, per node.
  • Host Machine CPU Utilization - CPU utilization, per node host machine. Note that multiple nodes running on a single host will report the same information.
  • Node Transaction Rate and Average Latency - The transaction rate and average latency, per node.
  • EventFlow/LiveView Amount of Heap Memory In Use - The amount of Java heap memory in use, per EventFlow/LiveView engine.
  • EventFlow/LiveView Total Current Queue Depth - The total queued tuple depth per EventFlow/LiveView JVM.
    Queue Depth

Configuration

The following aspects of monitoring behavior are configurable:

  • Service discovery
  • Credentials for statistics collection.
  • Administration commands.
  • Table naming.
  • Table aging.
  • LiveView Datamart listener port.

Configuration changes may be made:

  • At node/application installation time by replacing the default node deploy configuration using the nodedeploy parameter.
  • While the cluster monitor is running by loading and activating a discovery adapter configuration and/or a cluster monitor configuration.

    After activating the new configuration(s) the cluster monitor event flow must be restarted with the following command:

    $ epadmin --servicename=MONITOR_NODE_NAME restart container --engine=cluster-monitor

    Replace MONITOR_NODE_NAME with the name of the node where the cluster monitor application was installed.

Tables

The following statistics tables are configured by default:

  • Services - The monitor always contains a Services table showing services that have been discovered and their current state.
    Field Name Type Notes
    serviceState string
    serviceName string Primary key
    serviceAddress string
    serviceChangeTime timestamp
  • StreamBaseInfo - EventFlow/LiveView server Java heap memory use, and aggregate total number of queued tuples.
    Field Name Type Notes
    service string
    id long Primary key. Per row unique identifier
    time timestamp
    usedMemory long Bytes
    totalQueueDepth long
  • NodeInfo - Per node shared memory usage, transaction rate, and deadlock counter, and cpu usage for the machine where the node is running.
    Field Name Type Notes
    service string
    id long Primary key. Per row unique identifier
    time timestamp
    c_Version long history command output version
    c_Shared_Memory_Total long Bytes
    c_Shared_Memory_Used long Percentage
    c_CPU_Percent_User long Percentage
    c_CPU_Percent_System long Percentage
    c_CPU_Percent_Idle long Percentage
    c_Transactions during the last sample. long Number of per node transactions
    c_Average_Transaction_Latency_usecs long Average transaction length
    c_Transaction_Deadlocks long

Administration Command Tables

Any administration target command that generates output may be used. For example, the pre-configured NodeInfo table is equivalent to the following administration command:

$ epadmin --servicename=monitor.sb10 display history --seconds=1

By default, the generated table name for administration commands with be t_COMMAND_TARGET So for the command above this would become t_display_history This may be changed via configuration.

The first three columns are common to both administration command tables and StreamBaseInfo table.

The remaining columns consist of the administration command output. Their names are discovered and converted to meet LiveView Datamart column name requirements:

  • A leading c_ prefix is inserted.
  • Non alpha-numeric characters are converted to underscores.
  • Multiple underscore sequences are converted to single underscores.

For example, the column name: Shared Memory Size (bytes) is converted to: c_Shared_Memory_Size_bytes

Table Aging

Two configurable parameters control the automatic removal of rows from the monitor's tables.

  • rowAgeLimitSeconds - Rows older than this limit will be periodically removed.
  • rowCheckIntervalSeconds - How often tables are checked for row removal.

Statistics Collection Authentication

The cluster monitor will attempt to connect to each discovered service, authenticating using the configured credentials.

Administration commands use the administrationAuthentication section of the ClusterMonitor configuration.

EventFlow and LiveView services use the eventFlow section of the ClusterMonitor configuration.

NOTE: by default, no credentials are configured. The cluster monitor will only be able to interrogate services running on the local node, started by the same user who installed the cluster monitor.

The configuration supports a single set of credentials for administration commands, and a single set of credentials for EventFlow and LiveView services. It is recommended that a common login credential be configured throughout the target cluster, for use by cluster monitor.

Default Configuration

//
// Default configuration for the StreamBase Cluster Monitor.
//
// To change this configuration, make a copy of this entire file
// and use the nodedeploy option to epadmin install node.
//
// HOCON substitution variables used in this configuration:
//
//      NODE_NAME       Required. The name of the node that the monitor application is
//                      installed to (the nodename parameter to epadmin install node).
//
//      LIVEVIEW_PORT   Defaults to 11080
//
// These variables may be overridden via the substitutions or substitutionfile
// parameters to epadmin install node.
//
name = "ClusterMonitor"
version = "1.0"
type = "com.tibco.ep.dtm.configuration.node"
configuration =
{
    NodeDeploy =
    {
        nodes  =
        {
            "${NODE_NAME}" =
            {
                description = "Node for running the ClusterMonitor"

                engines =
                {
                    "cluster-monitor" =
                    {
                        fragmentIdentifier = "com.tibco.ep.cluster.eventflow"

                        configuration =
                        [
                            """
                            name = "cluster-monitor-service-discovery"
                            version = "1.0.0"
                            type = "com.tibco.ep.streambase.configuration.servicediscoveryadapter"
                            configuration =
                            {
                                ServiceDiscovery =
                                {
                                    associatedWithEngines = [ "cluster-monitor" ]

                                    //
                                    // The port to use for discovery requests.
                                    // Optional. Defaults to the port being used by the node where
                                    // this configuration is loaded.
                                    //
                                    // discoveryPort = 54321

                                    //
                                    // A list of host names that specify which network interfaces
                                    // to use when sending discovery requests.
                                    // Optional. Defaults to the system's host name.
                                    discoveryHosts = [ ]

                                    //
                                    // Service names that will be discovered by the cluster monitor.
                                    // Optional. Defaults to all service names found.
                                    // Uncomment to restrict.
                                    //
                                    // serviceNames = [ "cluster1", "cluster2", "nodeA.cluster3" ]

                                    //
                                    // Service types that will be discovered by cluster monitor.
                                    // Optional. Empty defaults to all service types, but the cluster
                                    // monitor only knows how to monitor node, eventflow and
                                    // liveview services.
                                    //
                                    serviceTypes = [ "node", "eventflow", "liveview" ]

                                    //
                                    // The number of seconds between discovery requests.
                                    // Optional. Defaults to 1.
                                    //
                                    // discoveryBrowseIntervalSeconds = 1

                                    //
                                    // This causes the cluster-monitor event flow application
                                    // to wait for the LiveView server to become ready before
                                    // starting.  Do not change.
                                    //
                                    autostart = false

                                    //
                                    // Whether or not services running within the monitor's node
                                    // are discovered and monitored.
                                    // Defaults to false.
                                    //
                                    includeLocalServices = ${INCLUDE_LOCAL_SERVICES:-false}
                                }
                            }
                            """,

                            """
                            name = "clustermonitor"
                            version = "1.0.0"
                            type = "com.tibco.ep.streambase.configuration.clustermonitor"
                            configuration =
                            {
                                ClusterMonitor =
                                {
                                    associatedWithEngines = [ "cluster-monitor" ]

                                    //
                                    // A list of administration commands to run against discovered
                                    // node services, every collection interval, and then publish
                                    // to LiveView server tables.
                                    //
                                    commands =
                                    [
                                        {
                                            //
                                            // Administration command name. Required.
                                            //
                                            commandName = "display"

                                            //
                                            // Administration target name. Required.
                                            //
                                            targetName = "history"

                                            //
                                            // A list of command parameters. Optional.
                                            //
                                            parameters =
                                            {
                                               "seconds" = "1"
                                            }

                                            //
                                            // The name of the LiveView table for the command results.
                                            // This name must be a legal LiveView table name, starting
                                            // with a letter, followed by a combination of letters, digitis,
                                            // and underscores.  Whitespace is not allowed.
                                            //
                                            // Optional. Will default to "t_commandName_targetName"
                                            //
                                            tableName = "NodeInfo"

                                            //
                                            // Number of seconds between command invocations.
                                            // Optional. Defaults to 1.
                                            //
                                            collectionIntervalSeconds = 1
                                        },

                                        // Additional commands may be added here.
                                    ]

                                    //
                                    // Age limit for data rows in the monitor tables.
                                    // Optional.  Defaults to 60.
                                    //
                                    // rowAgeLimitSeconds = 60

                                    //
                                    // The interval at which tables are checked for removing old rows.
                                    // Optional.  Defaults to 30.
                                    //
                                    // rowCheckIntervalSeconds = 30

                                    //
                                    // Optional credentials for running the administration commands.
                                    //
                                    // administrationAuthentication =
                                    // {
                                    //    //
                                    //    // The user name to use when connecting to the node.
                                    //    //
                                    //    userName = "administrator"
                                    //    //
                                    //    // The password to use when connecting to the node.
                                    //    //
                                    //    password = "This is a plain text password"
                                    // }

                                    //
                                    // Optional credentials for collecting StreamBase statistics.
                                    //
                                    // eventFlowAuthentication =
                                    // {
                                    //    //
                                    //    // The user name to use when connecting to the
                                    //    // EventFlow or LiveView StreamBase listener.
                                    //    //
                                    //    userName = "sbAministrator"
                                    //    //
                                    //    // The password to use when connecting to the
                                    //    // EventFlow or LiveView StreamBase listener.
                                    //    // A value beginning with '#!' is enciphered
                                    //    // with the sbcipher tool.
                                    //    //
                                    //    password = "#!xyzzy"
                                    // }
                                }
                            }
                            """,

                            """
                            name = "cluster-monitor-operator-parameters"
                            version = "1.0.0"
                            type = "com.tibco.ep.streambase.configuration.sbengine"
                            configuration =
                            {
                                StreamBaseEngine =
                                {
                                    streamBase =
                                    {
                                        operatorParameters =
                                        {
                                               LIVEVIEW_PORT = ""${LIVEVIEW_PORT:-11080}
                                        }
                                    }
                                }
                            }
                            """
                        ]
                    }

                    "liveview-server" =
                    {
                        fragmentIdentifier = "com.tibco.ep.cluster.liveview-server"

                        configuration =
                        [
                            """
                            name = "liveview-server-listener"
                            type = "com.tibco.ep.ldm.configuration.ldmclientapilistener"
                            version = "1.0.0"

                            configuration =
                            {
                                ClientAPIListener =
                                {
                                    associatedWithEngines = [ "liveview-server" ]
                                    portNumber = ${LIVEVIEW_PORT:-11080}
                                }
                            }
                            """,

                            """
                            name = "liveview-sb-listener"
                            type = "com.tibco.ep.streambase.configuration.sbclientapilistener"
                            version = "1.0.0"
                            configuration =
                            {
                                ClientAPIListener =
                                {
                                    associatedWithEngines = [ "liveview-server" ]
                                    apiListenerAddress =
                                    {
                                         portNumber = 0
                                    }
                                }
                            }
                            """,

                            """
                            name = "liveview-server-engine"
                            version = "1.0.0"
                            type = "com.tibco.ep.ldm.configuration.ldmengine"

                            configuration = {
                                LDMEngine = {
                                    //
                                    // Recommended JVM 1.8 flags for LiveView
                                    //
                                    jvmArgs =
                                    [
                                        "-Xmx4g"
                                        "-Xms1g"
                                        "-XX:+UseG1GC"
                                        "-XX:MaxGCPauseMillis=500"
                                        "-XX:ConcGCThreads=1"
                                    ]
                                    ldm = {
                                    }
                                }
                            }
                            """
                        ]
                    }
                }
            }
        }
    }
}