13. Ultra Messaging® Web Monitor

The built-in web monitor (configured in the umestored XML configuration file) is a rich source of information about the health of a UM stores and queues. This section contains a page-by-page guide to reading and interpreting the output of a UM web monitor, with just a couple example sources and one receiver using a single store. This section discusses the following topics.

13.1. Ultra Messaging Web Monitor Index Page

The web monitor's index page tells what build of UM is running.

Figure 24. UM Web Monitor Index Page

Click on the link, Stores, to see the Persistent Stores Page.

13.2. Persistent Stores Page

Figure 25. Persistent Stores Page

This page shows all the stores configured under the umestored process. If you had 5 stores configured, they would be numbered Store 0 through Store 4. Our example has only one store configured, ume-test-store. Click on the link, ume-test-store, to see the Store Page.

13.3. Store Page

Figure 26. Persistent Stores Page

This page shows the following information about the store.

Item Description
Interface This store is listening on all interfaces (0.0.0.0) on port 41394
Cache Dir Pathname for disk store message cache directory. This would be configured as a store attribute in the store's XML configuration file. <option type="store" name="disk-cache-directory" value="cache/" />
State Dir Pathname for disk store state directory. This would be configured as a store attribute in the store's XML configuration file. <option type="store" name="disk-state-directory" value="state/" />
Configured Retransmission Request Processing Rate Current value for the store's retransmission-request-processing-rate setting.
Retransmission Request Received Rate Number of retransmission requests received per second.
Retransmission Request Service Rate Number of retransmission requests serviced per second.
Retransmission Request Drop Rate Number of retransmission requests dropped per second. Requests are dropped if the rate of retransmission requests exceeds the configured retransmission request rate.
Retransmission Request Total Dropped The number of retransmission requests since the time the store was started.
Patterns Specifies the wildcard pattern used to select topics for which a store will provide persistence services. This would be configured as a topic attribute in the store's XML configuration file. <topic pattern="test*" type="PCRE">
Topics Displays the topic name and Registration IDs of the two sources publishing on the topic, 2369562861 and 3131255877.

You can review information about the sources publishing on the topic by clicking on Registration ID displayed. The Source Page appears.

13.4. Source Page

Figure 27. UM Web Monitor Source Page

The following table explains the information found in the title of the Source Page.

Source Page Title Description
2504558780 The source's registration ID.
10.29.3.42.14392 The IP address and port of the source's UM configuration option, request_tcp_port.
3958260924 The source's transport session index.
1161732811 The source's topic index within the transport session, 3958260924.

The transport session and topic indices are useful for debugging purposes when combined with a Wireshark capture, but are otherwise not relevant here. The following table provides descriptions of the items in the source page.

Source Page Item Description
Topic test is the source's topic string.
Last Activity 09:19:39.501350 is the timestamp when the store last heard from the source, including keepalives sent by UM
Repository disk is the type of repository.
Receiver Paced Persistence Setting for Receiver-paced Persistence (RPP), which is a repository option both the repositrory and source must enable. A value of 0 means RPP is not enabled and the repository is using the default Source-paced Resistence. A value of 1 means RPP is enabled.
Message Map: 104 104 is the total number of message fragments the store has for this source, both on disk and in memory. These are UM-level fragments, not IP-level fragments. UM messages are fragmented into roughly 8 kilobyte chunks for UDP-based protocols (LBT-RM and LBT-RU) and into roughly 64 kilobyte chunks for LBT-TCP. The majority of application messages tend to be well under the fragment boundaries, so the value after "Message Map" could be used as a rough estimate of the number of messages in the store from this particular source. It's at least a strict upper bound.
Window: [0, 0, 67] The first 0 is the trailing sequence number, which is the oldest sequence number in the store for this source. In most cases, this starts at 0 and stays there for a while, but especially with UME 2.0 where stores can come and go, that may not be the case. It would also move if you, for example, hit a disk file size limit and had to throw out some old messages.
The second 0 is the trailing sequence number for messages in memory, so it is the oldest sequence number still in memory. Typically, you might have more sequence numbers on disk than you do in memory, or possibly the same number.
The third number, 67, is the leading sequence number, which is the highest sequence number in the store.
NOTE: For a memory store, the first and second values would always be the same (the oldest sequence number in memory is the oldest in the store), so only two values are displayed; the trailing sequence number and the leading sequence number. These are sequence numbers of message fragments; there's usually just one fragment per message, but there could be more than one.
Memory: 7176 / 52428800 / 104857600 First number, 7176, is the number of bytes of messages (which includes headers and a bit of store overhead) in memory.
The second number, 52428800, is the repository-size-threshold topic option found in the store's XML configuration file.
The third number, 104857600, is the repository-size-limit setting.
You would expect the number of bytes in memory to be under the threshold most of the time, but it could spike above it before going back down if the store is really busy momentarily. It should never go above the limit.
Age Threshold: 0 0 is the repository-age-threshold setting.
Sync: [c2f, c2f, c2f] Pertains to disk or reduced-fd repositories only. Sync format is: sync_complete_sqn, sync_sqn, contig_sqn
sync_complete_sqn, c2f Most recent sequence number that the Operating System has confirmed persisting to disk.
sync_sqn, c2f Most recent sequence number for which the store has initiated persisting to disk, but the Operating System has not confirmed completion of persistence.
contig_sqn, c2f Most recent sequence number that along with the trail_sqn, creates a range of sequence numbers with no sequence number gaps. For example, if trail_sqn = 0 and the store has persisted all eleven messages with sequence numbers 0 through 10, contig_sqn would equal 10. contig_sqn would also be 10 if a receiver declared message sequence number 7 unrecoverably lost. contig_sqn would be 6 if message sequence number 7 was not persisted, but not declared lost.
In progress: 0 / 0 Pertains to disk or reduced-fd repositories only. In progress format is: num_ios_pending / num_read_ios_pending
num_ios_pending, 0 Number of disk writes the store has submitted to the Operation System. A disk write refers to the store persisting a message to disk.
The num_read_ios_pending, 0 Number of disk reads that the store has submitted to the Operating System. A disk read, for example, results from an application retransmission request.
Offsets: 0 / 190320 / 4294967296 Pertains to disk or reduced-fd repositories only. Offsets format is: start_offset, offset, max_offset
start_offset, 0 The relative location of the first message, trail_sqn, in the disk. start_offset is 0 for a reduced-fd repository.
The offset, 190320 The relative location of where the message, contig_sqn plus one will be written. offset represents the size of the repository on disk for a reduced-fd repository.
max_offset, 4294967296 The maximum size of the cache file. max_offset is the maximum repository size on disk for a reduced-fd repository.
Active ULBs: 0 high 0 ULB stands for Unrecoverable Loss Burst. A little extra work is required to keep cache files consistent when the store gets an unrecoverable loss burst, because unrecoverable loss bursts are delivered all at once for lots of messages, rather than one at a time like normal unrecoverable loss messages.
Active ULB is the number of unrecoverable loss burst events the store is dealing with at the moment. It'll go to zero after the ULB has been resolved.
The high number (0) is the highest sequence number reported among any unrecoverable loss burst event, and is not reset after the ULB is handled; it increments throughout the process life of the store.
WARNING: If you see any number other than 0 here, the store is losing large numbers of messages, and they are likely not being persisted.
Loss: 0 ULBs 0 These values are counters for number of unrecoverable loss messages (Loss) and for number of unrecoverable burst loss messages (ULB). These start at 0 when the store starts up and aren't reset until the store exits. They don't include any loss events that were persisted to disk from a previous run, only new loss events since the store started. There are cases with UME 2.0 where one individual store could legitimately report some unrecoverable loss, or maybe even unrecoverable loss bursts.
WARNING: If you see any number other than 0 for either of these counters, you should investigate.
Drops: 0 / 0 If the store is nearing the repository-size-limit and gets another message, the store will intentionally drop a message. A drop requires a bit of work on the store's part.
The first 0 is the number of active drops, which are drops that are currently being worked on.
The second 0 is the total number of drops that have happened for this store since it was started. Some people want a low repository-size-limit and therefore lots of intentional drops can occur. Some don't want to drop any message the whole day - so the interpretation of the values is up to you.
LBM Stats These represent transport-level statistics for the underlying receivers in the store for the source. The example shown is for a TCP source, so not too many stats are available (stats for a TCP source are less important from a monitoring perspective).
Statistics for an LBT-RM or LBT-RU source, however, show number of NAKs sent, which is important. Ideally, the number of NAKs sent should be 0. A few NAKs from a store throughout the day is not an emergency. It can be, however, an early warning sign of more severe problems, and should be taken seriously.
If you see a non-zero number of NAKs here, take a look at the overall network load the store's machine is attempting to handle, particularly in very busy periods and spikes; it may be too much.
Receivers Registration IDs for the receivers listening on the source's topic. You can review information about the receivers listening on the topic by clicking on Registration ID. The Receiver Page appears.

13.5. Receiver Page

Figure 28. UM Web Monitor Receiver Page

The following table explains the information found in the title of the Receiver Page.

Receiver Page Title Description
2504558781 The receiver's registration ID.
10.29.3.42.14393 The IP address and port of the source's UM configuration option, request_tcp_port.
1510613393 The receiver's transport session index.
1161732811 The source's topic index within the transport session, 1510613393.

The receiver page shows the following information.

Receiver Page Item Description
Topic The topic that the receiver is listening on.
Last Activity 09:09:35.981110 is the timestamp of when the store last heard from the receiver, including keepalives sent by UM.
Source RegID Registration ID of the source publishing on the topic. Click on the Registration ID link to display the Source Page.
Source Session ID The Session ID of the Source sending messages on the topic.
ACK c93 is the last message sequence number the receiver acknowledged.

13.6. Queue Page

Figure 29. Queue Page

This page shows the following information about Queue 0, which is named Queue 1. The queue name is an attribute of the Queue Element in the Queue's XML configuration file. <queue name="Queue 1" port="4567" group-index="0">

Item Description
Interface This queue is listening on all interfaces (0.0.0.0) on port 4567
Queue ID Identification given to this queue by UMQ .
Sending Threads Number of threads configured for this queue to send control and data messages. This would be configured as a Queue Element option in the queue's XML configuration file. <option type="queue" name="sending-threads" value="1/" />
Sending Threads(s) Queue Size The number of messages waiting to be sent to receivers in the pool of sending threads. This could be data and control messages for Parallel Queue Dissemination (PQD), data messages only for Serial Queue Dissemination (SQD) or control messages only for Source Dissemination (SD).
Registered Contexts Number of application contexts registered with this queue.
Retransmit Requests Message Requests: Number of requests for data message retransmission.
Queue RCR Requests: Number of requests for the retransmission of control information.
Dropped Requests: Number of retransmission requests dropped by Queue 1.
Patterns Specifies the wildcard patterns used to select topics for which a queue will accept data messages. This would be configured as a topic attribute in the queue's XML configuration file. <topic pattern="." type="PCRE">
Topics Displays the RCR Index (697157a5) and topic name (a.b) of the topic(s) configured for this queue.

You can review information about the Queue's topics and application sets by clicking on the topic's RCR Index, 697157a5). The Queue Topic Page appears.

13.7. Queue Topic Page

Figure 30. Queue Topic Page

This page shows the following information about the queue topic, a.b. This topic's RCR Index is 697157a5.

Item Description
Queue The Queue Name for which this topic is configured. It is also a link back to the Queue Page for this queue.
Application Sets Queue 1 has 2 Application Sets configured.
Consumed Messages Total number of messages consumed by all Application Sets.
Reassignments Number of messages that have been reassigned.
Topic RCR Requests Number of requests for the retransmission of control information regarding this topic.
Saved RCRs Number of Receiver Control Records save due to retransmissions. You configure how long the queue saves RCRs a Queue Element attribute in the queue's XML configuration file. <option type="queue" name="rcr-save-timeout" value="30000"/>.
Application Set Set 2 is the name of this Application Set.
Enqueued Messages: Number of messages currently held in Queue 1 for Set 2.
Currently Assigned: Number of messages currently assigned to a receiver in this Application Set.
Currently Reassigning: Number of messages waiting to be reassigned to receivers.
Consumed Messages: Number of messages consumed by receivers in this Application Set.
Reassignments: Number of messages that have been reassigned to another receiver in this Application Set.
Discarded Messages: Number of messages assigned to receivers in this Application Set that have been discarded.
Receivers Number of receivers (1) configured for this Application Set. Specific information for each receiver appears in the table below this item.
ID: The Assignment ID given to this receiver by the queue.
Address: The address and port of the receiver.
Portion: This receiver's portion size that you configure as a Receiver Type attribute in the queue's XML configuration file. <option type="queue" name="portion" value="1"/>.
Priority: This receiver's priority that you configure as a Receiver Type attribute in the queue's XML configuration file. <option type="queue" name="priority" value="1"/>.
Outstanding: Number of assigned messages for which the queue has not yet received Consumption Reports.
Last Active: A timestamp indicating the last activity for the receiver.
Consumed: Total messages consumed by this receiver. The total for this column should match the Consumed Messages value for the Application Set.
Assigned The number of messages currently assigned to all receivers for the Application Set. For each assigned message, the Message ID and Assignment ID for the receiver assigned the message appears. In addition, the Reassign and Discard links allow you to reassign or discard the individual message.

Copyright 2007 - 2014 Informatica Corporation.