See Monitoring for an overview of monitoring an Ultra Messaging network.
It is important to the health and stability of a UM network to monitor the operation of DROs (if any). This monitoring should include real-time automated detection of problems that will produce a timely alert to operations staff.
Three types of data should be monitored:
For UM library stats and daemon stats, the monitoring messages contain an "application ID". For UM applications, this is a user-specified name intended to identify the individual component/instance, and is supplied by the option monitor_appid (context).
However, in the DRO, the application ID is NOT controlled by the "monitor-appid" option, and is instead used to identify not only the specific DRO, but also the portal within the DRO that is supplying the stats.
In the case of the DRO's daemon stats, the application ID is set to the Router Element "<name>" located within the Router Element "<daemon>". For example, a DRO configured with:
<tnw-gateway version="1.0">
<daemon>
<name>dro1</name>
...
The daemon stats will have the application ID "dro1".
In the case of UM library stats (context, transport, event queue), the application ID is constructed as follows:
Gateway_Portal_
portalname_
portalcontext
Where portalname is set to the Router Element "<name>" located within the Router Element "<endpoint>", and portalcontext is set to either "rcv_ctx" or "src_ctx". For example, a DRO configured with:
...
<portals>
<endpoint>
<name>TRD1</name>
...
The UM library stats will have the application ID "Gateway_Portal_TRD1_rcv_ctx"
and "Gateway_Portal_TRD1_src_ctx"
.
DRO Monitoring: Logs <-
Ideally, log file monitoring would support the following:
-
Archive all log messages for all DROs for at least a week, preferably a month.
-
Provide rapid access to operations staff to view the latest log messages from a DRO.
-
Periodic scans of the log file to detect errors and raise alerts to operations staff.
Regarding log file scanning, messages in the DRO's log file contain a severity indicator in square brackets. For example:
[2022-11-01 13:28:51.720796] [information] Gwd-9574-01: RecalcTrigger:LINK CAME UP:Version = 1:NodeId = 1
Informatica recommends alerting operations staff for messages of severity [WARNING], [ERROR], [CRITICAL], [ALERT], and [EMERGENCY].
It would also be useful to have a set of exceptions for specific messages you wish to ignore.
There are many third party real-time log file analysis tools available. A discussion of possible tools is beyond the scope of UM documentation.
DRO Monitoring: UM Library Stats <-
The DRO communicates with applications using Ultra Messaging protocols, and therefore makes use of the UM library. It is just as important to monitor the UM library statistics for the DRO as it is for applications.
There are two data formats for UM library stats:
-
Protobufs - recommended.
-
CSV - deprecated. Informatica recommends migrating to protobufs.
For example, here is an excerpt from a sample DRO configuration file that shows how automatic monitoring is enabled:
<?xml version="1.0" encoding="UTF-8" ?>
<tnw-gateway version="1.0">
<daemon>
<name>dro1</name>
...
<monitor interval="600">
<transport-module module="lbm"/>
<format-module module="pb"/>
</monitor>
<xml-config>um.xml</xml-config>
...
Here is an excerpt from a sample "um.xml":
<?xml version="1.0" encoding="UTF-8" ?>
<um-configuration version="1.0">
<templates>
...
<template name="mon_ctx">
<options type="context">
<option name="resolver_unicast_daemon" default-value="10.29.3.101:12801"/>
<option name="default_interface" default-value="10.29.3.0/24"/>
<option name="mim_incoming_address" default-value="0.0.0.0"/>
...
</options>
<options type="source">
<option name="transport" default-value="tcp"/>
</options>
</template>
...
</templates>
<applications>
...
<application name="tnwgd"> <!-- DRO -->
<contexts>
<context name="TRD1" template="um_common,res_trd1">
...
</context>
<context name="TRD2" template="um_common,res_trd2">
...
</context>
<context name="29west_statistics_context" template="mon_ctx">
<sources/>
</context>
</contexts>
</application>
....
Notes:
-
The Router Element "<format-module>" value "pb" selects the protobuf format and is available for the DRO in UM version 6.14 and beyond. Selecting this format implicitly enables the inclusion of the DRO's daemon stats (see below).
-
For a list of possible protobuf messages for the DRO, see the "dro_mon.proto" file at Example dro_mon.proto.
-
The Router Element "<monitor>" enables automatic monitoring and defines the statistics sampling period. In the above example, 600 seconds (10 minutes) is chosen somewhat arbitrarily. Shorter times produce more data, but not much additional benefit. However, UM networks with many thousands of applications may need a longer interval (perhaps 30 or 60 minutes) to maintain a reasonable load on the network and monitoring data storage.
-
When automatic monitoring is enabled, it creates a context named "29west_statistics_context". It is configured with the "mon_ctx" template, which sets options for the monitoring data TRD. (Alternatively, you can configure the monitoring context using monitor_transport_opts (context).) When possible, Informatica recommends directing monitoring data to an administrative network, separate from the application data network. This prevents monitoring data from interfering with application data latency or throughput. In this example, the monitoring context is configured to use an interface matching 10.29.3.0/24
.
-
In this example, the monitoring data TRD uses Unicast UDP Topic Resolution. The lbmrd daemon is running on host 10.29.3.101, port 12001.
-
The monitoring data is sent out via UM using the TCP transport.
-
These settings were chosen to conform to the recommendations in Automatic Monitoring.
For a full demonstration of monitoring, see: https://github.com/UltraMessaging/mcs_demo
DRO Monitoring: Daemon Stats <-
The daemon statistics for the DRO represent a superset of the information presented on the DRO Web Monitor.
There are two data formats for the DRO to send its daemon stats:
-
Protobufs - recommended.
-
Binary - deprecated. Informatica recommends migrating to protobufs. For information on the deprecated binary formatted daemon stats, see DRO Binary Daemon Statistics.
The recommended way to enable DRO daemon stats is by enabling UM library stats using the DRO's <monitor> element with <format-module module="pb">. For example, here's an excerpt from a DRO configuration file from https://github.com/UltraMessaging/mcs_demo file um.xml:
<monitor interval="600">
<transport-module module="lbm"/>
<format-module module="pb"/>
</monitor>
The protobufs format is accepted by the Monitoring Collector Service (MCS) and the "lbmmon" example applications: Example lbmmon.c and Example lbmmon.java.
For a list of possible protobuf messages for the DRO, see the "dro_mon.proto" file at Example dro_mon.proto.
For a full demonstration of monitoring, including DRO daemon stats, see: https://github.com/UltraMessaging/mcs_demo
See also DRO Monitoring: UM Library Stats.
DRO Web Monitor <-
- Note
- The DRO web monitor functionality is deprecated in favor of MCS. We do not plan to remove existing web monitor functionality, and will continue to support it in its current state. But we do not plan to enhance the web monitor in the future.
The built-in web monitor (configured in the tnwgd
XML configuration file; see DRO Configuration Reference) provides valuable statistics about the DRO and its portals, for which, the Web Monitor separates into receive statistics and send statistics. The Web Monitor provides a page for each endpoint and peer portal.
- Warning
- The DRO's web monitor is not designed to be a highly-secure feature. Anybody with access to the network can access the web monitor pages.
Users are expected to prevent unauthorized access to the web monitor through normal firewalling methods. Users who are unable to limit access to a level consistent with their overall security needs should disable the DRO web monitor (using <web-monitor>). See Webmon Security for more information.
Main Page <-
This page displays general information about the DRO, and also provides the following links to more detailed statistical and configuration information.
- UM Router Configuration
- Displays the DRO XML configuration file used by this DRO.
- Portals
- Displays portal statistics and information, one portal per page. The Portals page allows you to link to any of the Peer or Endpoint portals configured for the DRO.
- Topology Info
- This links to a page that displays DRO network connectivity information from the perspective of this DRO.
- Path Info
- This lets you query and display a hop path that messages will take between any two TRDs.
On some platforms, the Main page may include a link (GNU malloc info) to a memory allocation display page that displays the following:
- arena
- Non-mmapped space allocated (bytes)
- ordblks
- Number of free chunks
- hblks
- Number of mmapped regions
- hblkhd
- Space allocated in mmapped regions (bytes)
- uordblks
- Total allocated space (bytes)
- fordblks
- Total free space (bytes)
Endpoint Portal Page <-
The Endpoint Portal Page displays Receive and Send statistics for the selected endpoint portal. Receive statistics pertain to messages entering the portal from its connected TRD. Send statistics pertain to messages sent out to the TRD.
Click on any of the links at the top of the page to review configuration option values for the portal's UM topic resolution domain. The two columns provide different units of measure for a given statistic type, where the first column is typically in fragments or messages (depending on the statistic type), and the second column is in bytes.
Endpoint Portal name
- Domain ID
- The ID for the Topic Resolution Domain (TRD) to which this portal is connected.
- Portal Cost
- The cost value assigned to this portal.
- Local Interest
- Totals (listed below) for topics and patterns in this portal's interest list that originated from receivers in the immediately adjacent TRD.
- Topics
- Of the local interest total, the number of topics.
- PCRE patterns
- Of the local interest total, the number of wildcard patterns, using PCRE pattern matching.
- REGEX patterns
- Of the local interest total, the number of wildcard patterns, using REGEX pattern matching.
- Remote Interest
- Totals (listed below) for topics and patterns in this portal's interest list that originated from receivers beyond and downstream from the immediately adjacent TRD.
- Topics
- Of the remote interest total, the number of topics.
- PCRE patterns
- Of the remote interest total, the number of wildcard patterns, using PCRE pattern matching.
- REGEX patterns
- Of the remote interest total, the number of wildcard patterns, using REGEX pattern matching.
- Proxy Receivers
- The number of proxy receivers active in this portal.
- Receiver Topics
- The number of topics in which the other portals in the DRO have detected current interest and summarily propagated to this portal.
- Receiver PCRE patterns
- The number of wildcard patterns, using PCRE pattern matching, in which the other portals in the DRO have detected current interest and summarily propagated to this portal.
- Receiver REGEX patterns
- The number of wildcard patterns, using REGEX pattern matching, in which the other portals in the DRO have detected current interest and summarily propagated to this portal.
- Proxy Sources
- The number of proxy sources active in this portal.
Endpoint Receive Statistics
- Transport topic fragments/bytes received
- The total transport-based topic-related traffic of messages containing user data received by this portal from a TRD. The first column counts the number of fragments (or whole messages for messages that were not fragmented).
- Transport topic request fragments/bytes received
- Topic messages received that are request messages, i.e., messages send via lbm_send_request*() rather than lbm_src_send*().
- Transport topic control msgs/bytes received
- The total transport-based topic-related traffic received by this portal from a TRD. These are supervisory messages, which include TSNIs, SRIs., etc. The first column counts the number of messages.
- Immediate topic fragments/bytes received
- The total number of Multicast Immediate Messaging (MIM) messages or message fragments, and bytes (second column), that have a topic, received at this portal.
- Immediate topic request fragments/bytes received
- Of the MIM topic messages received, this is the amount of those that are requests.
- Immediate topicless fragments/bytes received
- The total number of MIM messages or message fragments, and bytes (second column), with null topics, received by his portal.
- Immediate topicless request fragments/bytes received
- Of the MIM topicless messages received, this is the amount of those that are requests.
- Unicast data messages/bytes received
- The total number of Unicast Immediate Messaging (UIM) messages (and bytes, second column) containing user data, received by this portal.
- Duplicate unicast data messages/bytes dropped
- UIM data messages discarded because they were duplicates of messages already received.
- Unicast data messages/bytes received with no stream info
- UIM data messages discarded because they were from an earlier, incompatible version of UM. This counter should stay at 0; otherwise, contact Informatica Support.
- Unicast data messages/bytes received with no route to destination
- UIM data messages that are on a wrong path, possibly due to a route recalculation. This counter should stay at 0, though it may increment a few messages at the time of a topology change.
- Unicast control messages/bytes received
- The total number of Unicast Immediate Messaging (UIM) supervisory (non-data) messages (and bytes, second column) received by this portal.
- Duplicate unicast control messages/bytes dropped
- Supervisory UIMs dropped because they were duplicates of messages already received.
- Unicast control messages/bytes received with no stream info
- Supervisory UIMs dropped because they were from an earlier, incompatible version of UM. This counter should stay at 0; otherwise, contact Informatica Support.
- Unicast control messages/bytes received with no route to destination
- Supervisory UIM messages that are on a wrong path, possibly due to a route recalculation. This counter should stay at 0, though it may increment a few messages at the time of a topology change.
Endpoint Send Statistics
- Transport topic fragments/bytes forwarded
- The total transport-based topic-related traffic forwarded to this portal from other portals in this DRO. This could include user messages, TSNIs, SRIs, etc. The first column counts the number of fragments (or whole messages for messages that were not fragmented).
- Transport topic fragments/bytes sent
- Of the transport topic traffic forwarded, this is the amount of traffic sent out to the TRD.
- Transport topic request fragments/bytes sent
- Of the messages sent, this is the amount of those that are requests.
- Duplicate transport topic fragments/bytes dropped
- Of the messages forwarded to this portal, this is the total of those that were discarded because they were duplicates of messages already received.
- Transport topic fragments/bytes dropped due to blocking
- Of the messages forwarded to this portal, this is the amount of those that were discarded because they were blocked from sending, and were unable to be buffered. Message rates on other portals probably exceeded the rate controller limit on this portal.
- Transport topic fragments/bytes dropped due to error
- Of the messages forwarded to this portal, this is the total of those that were discarded due to an application or network connection failure.
- Transport topic fragments/bytes dropped due to fragment size error
- Of the messages forwarded to this portal, this is the total of those that were discarded possibly because of a configuration error. If this count is not at or near 0, verify that maximum datagram size for all transports is the same throughout the network.
- Immediate topic fragments/bytes forwarded
- The total number of Multicast Immediate Messaging (MIM) messages or message fragments, and bytes (second column), forwarded to this portal from other portals in this DRO.
- Immediate topic fragments/bytes sent
- Of the MIM topic messages forwarded to this portal, this is the amount of traffic sent out to the TRD.
- Immediate topic request fragments sent
- Of the MIM topic messages sent, this is the amount of those that are requests.
- Immediate topic fragments/bytes dropped due to blocking
- Of the MIM topic messages forwarded to this portal, this is the amount of those that were discarded because they were blocked from sending, and were unable to be buffered. Message rates on other portals probably exceeded the rate controller limit on this portal.
- Immediate topic fragments/bytes dropped due to error
- Of the MIM topic messages forwarded to this portal, those that were discarded due to an application or network connection failure.
- Immediate topic fragments/bytes dropped due to fragment size error
- Of the MIM topic messages forwarded to this portal, those that were dropped possibly because of a configuration error. If this count is not at or near 0, verify that maximum datagram size for all transports is the same throughout the network.
- Immediate topicless fragments/bytes forwarded
- The total number of Multicast Immediate Messaging (MIM) messages or message fragments, and bytes (second column), with null topics, forwarded to this portal from other portals in this DRO.
- Immediate topicless fragments/bytes sent
- Of the MIM topicless messages forwarded to this portal, this is the amount of traffic sent out to the TRD.
- Immediate topicless request fragments sent
- Of the MIM topicless messages sent, this is the amount of those that are requests.
- Immediate topicless fragments/bytes dropped due to blocking
- Of the MIM topicless messages forwarded to this portal, this is the amount of those that were discarded because they were blocked from sending, and were unable to be buffered. Message rates on other portals probably exceeded the rate controller limit on this portal.
- Immediate topicless fragments/bytes dropped due to error
- Of the MIM topicless messages forwarded to this portal, those that were discarded due to an application or network connection failure.
- Immediate topicless fragments/bytes dropped due to fragment size error
- Of the MIM topicless messages forwarded to this portal, those that were dropped possibly because of a configuration error. If this count is not at or near 0, verify that maximum datagram size for all transports is the same throughout the network.
- Unicast messages/bytes forwarded
- The total number of Unicast Immediate Messaging (UIM) messages (and bytes, second column), both control and containing user data, forwarded to this portal.
- Unicast messages/bytes sent
- Of the UIM data messages forwarded to this portal, this is the amount of traffic sent out to the TRD.
- Unicast messages/bytes dropped due to error
- Of the UIM data messages forwarded to this portal, those that were discarded due to an application or network connection failure.
- Current/maximum data bytes enqueued (limit: n)
- For bytes in this portal's send buffer (due to a blocking send), the first column is a snapshot of the current amount, and the second column is a high-water mark. The displayed limit (n) is the configuration value for option <max-queue>.
Peer Portal Page <-
This page allows you to see Receive and Send statistics for the selected peer portal. Click on any of the links at the top of the page to review configuration option values for the portal's UM topic resolution domain.
The peer portal page displays the following statistics:
Peer Portal name
- Portal Cost
- The cost value assigned to this portal.
- Interest
- Totals (listed below) for topics and patterns in this portal's interest list that originated from receivers beyond and downstream from the immediately adjacent DRO.
- Topics
- Of the interest total, the number of topics.
- PCRE patterns
- Of the interest total, the number of wildcard patterns, using PCRE pattern matching.
- REGEX patterns
- Of the interest total, the number of wildcard patterns, using REGEX pattern matching.
- Proxy Receivers
- The number of proxy receivers active in this portal.
- Receiver topics
- All topics in which the other portals in the DRO have detected current interest and summarily propagated to this portal.
- Receiver PCRE patterns
- All wildcard patterns, using PCRE pattern matching, in which the other portals in the DRO have detected current interest and summarily propagated to this portal.
- Receiver REGEX patterns
- All wildcard patterns, using REGEX pattern matching, in which the other portals in the DRO have detected current interest and summarily propagated to this portal.
- Proxy Sources
- The number of proxy sources active in this portal.
Peer Receive Statistics
- Data messages/bytes received
- The total of messages containing data received at this portal. The first column counts the number of fragments (or whole messages for messages that were not fragmented).
- Transport topic fragment data messages/bytes received
- The total of user-data messages received on any topic resolved through this portal. The first column counts the number of fragments (or whole messages for messages that were not fragmented).
- Transport topic fragment data messages/bytes received with unknown source
- Topic messages received whose source this DRO has not seen before.
- Transport topic request fragment data messages/bytes received
- These are topic messages received that are request messages, i.e., messages send via lbm_send_request*() rather than lbm_src_send*().
- Transport topic request fragment data messages/bytes received with unknown source
- Of the request messages received, the topic messages received whose source this DRO has not seen before.
- Immediate topic fragments/bytes received
- The total number of Multicast Immediate Messaging (MIM) messages or message fragments, and bytes (second column), that have a topic, received by all proxy receivers at this portal.
Immediate topic request fragments/bytes received Of the MIM topic messages received, this is the total of those that are requests.
- Immediate topicless fragments/bytes received
- The total number of MIM messages or message fragments, and bytes (second column), with null topics, received by all proxy receivers at this portal.
- Immediate topicless request fragments/bytes received
- Of the MIM topicless messages received, this is the total of those that are requests.
- Unicast data messages/bytes received
- The total number of Unicast Immediate Messaging (UIM) messages (and bytes, second column) containing user data, received by this portal.
- Unicast data messages/bytes received with no stream information
- UIM data messages discarded because they were from an earlier, incompatible version of UM. This counter should stay at 0; otherwise, contact Informatica Support.
- Unicast data messages/bytes received with no route to destination
- UIM data messages that are on a wrong path, possibly due to a route recalculation. This counter should stay at 0, though it may increment a few messages at the time of a topology change.
- Control messages/bytes received
- The total of supervisory messages (containing no data) received at this portal.
- Transport topic control messages/bytes received
- Of the control messages received, those that are transport/topic based (such as TSNIs, SRIs., etc.).
- Transport topic control messages/bytes received with unknown source
- Of the transport/topic control messages received whose source this DRO has not seen before.
- Unicast control messages/bytes received
- The total number of Unicast Immediate Messaging (UIM) supervisory (non-data) messages (and bytes, second column) received by this portal.
- Retransmission requests/bytes received
- Supervisory UIMs that are requests for retransmission of lost (or Late Join) messages.
- Control messages/bytes received with no stream info
- Supervisory UIMs discarded because they were from an earlier, incompatible version of UM. This counter should stay at 0; otherwise, contact Informatica Support.
- Control messages/bytes received with no route to destination
- Supervisory UIM messages that are on a wrong path, possibly due to a route recalculation.
- Gateway control messages/bytes received
- The total of DRO-only, peer-to-peer supervisory messages received at this portal.
- Unhandled control messages/bytes received
- Supervisory UIMs discarded because, though they are well-formed, they have no valid action request. This counter should stay at 0; otherwise, contact Informatica Support.
Peer Send Statistics
- Transport topic fragments/bytes forwarded
- The total transport-based topic-related traffic forwarded to this portal from other portals in this DRO. This could include user messages, TSNIs, SRIs., etc. The first column counts the number of fragments (or whole messages for messages that were not fragmented).
- Transport topic fragments/bytes sent
- Of transport topic messages forwarded to this portal, the amount of traffic sent to the adjacent DRO.
- Transport topic request fragments/bytes sent
- Of transport topic messages sent, those that were request messages.
- Transport topic fragments/bytes dropped (duplicate)
- Of transport topic messages forwarded to this portal, messages discarded because they were duplicates of messages already received.
- Transport topic fragments/bytes dropped (blocking)
- Of transport topic messages forwarded to this portal, this is the amount of those that were discarded because they were blocked from sending, probably due to TCP flow control, and were unable to be buffered. The DRO's XML configuration file may need to be adjusted.
- Transport topic fragments/bytes dropped (not operational)
- Of transport topic messages forwarded to this portal, messages discarded because the peer link is down.
- Transport topic fragments/bytes dropped (queue failure)
- Of transport topic messages forwarded to this portal, messages discarded due to a memory allocation failure.
- Unicast messages/bytes forwarded
- The total number of supervisory (no data payloads) Unicast Immediate Messaging (UIM) messages (and bytes, second column) forwarded to this portal from other portals in this DRO. These messages can be either control (supervisory) messages or contain user data.
- Unicast messages/bytes sent
- Of the UIMs forwarded to this portal, the amount of traffic sent to the adjacent DRO.
- Unicast messages/bytes dropped (blocking)
- Of the UIMs forwarded to this portal, this is the amount of those that were discarded because they were blocked from sending, probably due to TCP flow control, and were unable to be buffered. The DRO's XML configuration file may need to be adjusted.
- Unicast messages/bytes dropped (not operational)
- Of the UIMs forwarded to this portal, messages discarded because the peer link is down.
- Unicast messages/bytes dropped (queue failure)
- Of the UIMs forwarded to this portal, messages discarded due to a memory allocation failure.
- Gateway control messages/bytes sent
- The total number of DRO supervisory messages (and bytes, second column), generated at this portal.
Gateway control messages/bytes sent Of the DRO supervisory messages generated, the number sent to the adjacent DRO.
Gateway control messages/bytes dropped (blocking) The amount of DRO supervisory messages that were discarded because they were blocked from sending, probably due to TCP flow control, and were unable to be buffered. The DRO's XML configuration file may need to be adjusted.
- Gateway control messages/bytes dropped (not operational)
- The amount of DRO supervisory messages that were discarded because the peer link was down.
- Gateway control messages/bytes dropped (queue failure)
- The amount of DRO supervisory messages that were discarded due to a memory allocation failure.
- Batches
- The number of times messages were batched.
- Minimum messages/bytes per batch
- The lowest recorded number of messages in a batch, and the number of bytes in that batch.
- Average messages/bytes per batch
- The average number of messages in a batch, and the number of bytes in that average batch.
- Maximum messages/bytes per batch
- The highest recorded number of messages in a batch, and the number of bytes in that batch.
- Current/maximum data bytes enqueued
- For bytes in this portal's send buffer (due to a blocking send), the first column is a snapshot of the current amount, and the second column is a high-water mark. The displayed limit is the configuration value for option
<max-queue>
.
- Keepalive/RTT samples
- The number of keepalive messages that have been set to the other DRO's portal and responded to.
- Minimum RTT (microseconds)
- Of the keepalives sent and responded to, the lowest recorded round-trip time.
- Mean RTT (microseconds)
- Of the keepalives sent and responded to, the mean recorded round-trip time.
- Maximum RTT (microseconds)
- Of the keepalives sent and responded to, the highest recorded round-trip time.
- Last keepalive responded to
- The send timestamp (date and time) of the last sent keepalive message that was responded to.
Topology Info Page <-
This page allows you to see DRO network connectivity information from the perspective of this DRO. The Other DROs section (below) provides information in the same format as is used for the local DRO.
- Local UM Router Name
- The DRO name as assigned via configuration.
- Local UM Router ID
- A unique value that the DRO assigns to itself automatically.
- Self Version
- A configuration version for this DRO, as seen collectively by the DRO network.
- Topology Signature
- An identifier for the "map" of this DRO network's routes. This value should be the same for all DROs.
- Last recalc duration
- The amount of time in seconds that it took this DRO to perform its most recent route recalculation.
- Graph Version
- The number of times this DRO has updated its view of the topology.
- UM Router Count
- The number of DROs in this DRO network.
- Topic Resolution Domain Count
- The number of TRDs in this DRO network.
Portal (endpoint or peer)
This display is repeated for each portal of this DRO.
- Portal Name
- The portal's name as assigned via configuration.
- Adjacent Domain/UM Router ID
- For an endpoint portal, this is the configured <domain-id> for the connected TRD. For a peer portal, this is an automatically assigned unique identifier for the connected DRO.
- Cost
- This portal's configured cost.
- Last interest recalc duration
- The amount of time in seconds that it took this DRO to perform a recalculation that resulted in an update to the interest status for this portal.
- Last proxy receiver recalc duration
- The amount of time in seconds that it took this DRO to perform recalculation that resulted in an update to the status of proxy receivers (create, maintain, or destroy) for this portal.
Other DROs
This display is repeated for each other DRO in this DRO's network.
- UM Router Name
- The DRO name as assigned via configuration.
- UM Router ID
- A unique value that the DRO assigns to itself automatically.
- Version
- A configuration version for the DRO, as seen collectively by the DRO network.
- Topology Signature
- An identifier for the "map" of this DRO network's routes. This value should be the same for all DROs.
- Last Activity n seconds ago
- How long since the last time this local DRO received a route info packet from the designated "other" DRO.
- Adjacent Domain ID
- The configured ID of one of this "other" DRO's connected TRD, plus the cost assigned to the associate endpoint portal. If there are more than one endpoint portals in the DRO, this line is repeated for each.
- Adjacent UM Router ID
- The automatically assigned ID of one of this "other" DRO's connected DRO, plus the cost assigned to the associate peer portal. If there are more than one peer portals in the DRO, this line is repeated for each.
Path Info <-
The Path Info page lets you query and display a hop path that messages will take between any two TRDs that you enter into the Domain ID 1 and Domain ID 2 text boxes. Fill in the boxes and click the Calculate Shortest Path button, and you see the following fields:
- Hop Count
- The number of hops from none node to the next along the displayed route, where a node can be either a DRO or a TRD.
- Aggregate Cost
- A sum of the cost values of all portals along the displayed path.
- Path
- A display of the DRO and TRD hops listed in route order from the starting TRD to the ending TRD.
DRO Log Messages <-
The DRO daemon generates log messages that are used to monitor its health and operation. You can configure these to be directed to "console" (standard output), "syslog", or a specified log "file", via the <log> configuration element. Normally "console" is only used during testing, as a persistent log file is preferred for production use. The DRO does not over-write log files on startup, but instead appends them.
DRO Rolling Logs <-
To prevent unbounded disk file growth, the DRO supports rolling log files. When the log file rolls, the file is renamed according to the model:
CONFIGUREDNAME_
PID.
DATE.
SEQNUM
where:
-
CONFIGUREDNAME - Root name of log file, as configured by user.
-
PID - Process ID of the DRO daemon process.
-
DATE - Date that the log file was rolled, in YYYY-MM-DD format.
-
SEQNUM - Sequence number, starting at 1 when the process starts, and incrementing each time the log file rolls.
For example: umrouterlog_9867.2017-08-20.2
The user can configure when the log file is eligible to roll over by either or both of two criteria: size and frequency. The size criterion is in millions of bytes. The frequency criterion can be daily or hourly. Once one or both criteria are met, the next message written to the log will trigger a roll operation. These criteria are supplied as attributes to the <log> configuration element.
If both criteria are supplied, then the first one to be reached will trigger a roll. For example, consider the setting:
<log type="file" size="23" frequency="daily">dro.log</log>
Let say that the log file grows at 1 million bytes per hour. At 11:00 pm, the log file will reach 23 million bytes, and will roll. Then, at 12:00 midnight, the log file will roll again, even though it is only 1 million bytes in size.
- Note
- The rolling logs cannot be configured to automatically overwrite old logs. Thus, the amount of disk space consumed by log files will grow without bound. The user must implement a desired process of archiving or deleting older log files according to the user's preference.
Important DRO Log Messages <-
Connection Failure Messages
peer portal [name] failed to connect to peer at [IP:port] via [interface] [err]: reason
peer portal [name] failed to accept connection (accept) [err]: reason
Lost Connection Messages
peer portal [name] lost connection to peer at [IP:port] via [interface]
peer portal [name] connection destroyed due to socket failure
peer portal [name] detected dropped inbound connection (read) [err]: reason
peer portal [name] detected dropped inbound connection (zero-len read)
Endpoint Messages
If a UMP store is adjacent to the DRO, and the DRO has been restarted, you typically see messages of the form:
endpoint portal [name] has no forwarding entry for destination ctxinst [string], dropping msg (lbmc cntl ume)
These messages are normal, and cease when the DRO has established the forwarding information for the given context.
Peer Messages
Acceptor: peer portal [name] received connection from [IP:port]
Initiator: peer portal [name] connected to [IP:port ]
DRO Transport Stats <-
Using the <monitor>
element in a DRO's XML configuration file and the UMS Monitoring feature, you can monitor the transport activity between the DRO and its Topic Resolution Domain. The configuration also provides Context and Event Queue statistics. The statistics output identifies individual portals by name.