Guide for Persistence
Store Web Monitor
Note
The Store web monitor functionality is deprecated in favor of MCS. We do not plan to remove existing web monitor functionality, and will continue to support it in its current state. But we do not plan to enhance the web monitor in the future.

The built-in web monitor (configured in the Store configuration file) is a rich source of information about the health of a Store. This section contains a page-by-page guide to reading and interpreting the output of a UM web monitor, with just a couple example sources and one receiver using a single Store.

Warning
The Store's web monitor is not designed to be a highly-secure feature. Anybody with access to the network can access the web monitor pages.

Users are expected to prevent unauthorized access to the web monitor through normal firewalling methods. Users who are unable to limit access to a level consistent with their overall security needs should disable the Store web monitor (using <web-monitor>). See Webmon Security for more information.


Store Web Monitor Index Page  <-

Here is an image of the Web Monitor's Index (main) page:

webmon_index.png

The web monitor's index page tells what build of UM is running.

The "Stores" link displays the Store Web Monitor Stores Page.


Store Web Monitor Stores Page  <-

Here is an image of the Web Monitor's Stores page:

webmon_stores.png

This page shows all the Stores configured under the umestored process. If you had 5 Stores configured, they would be numbered Store 0 through Store 4. Our example has only one Store configured, "ume-test-store".

Each Store name is a clickable link, which displays the Store Web Monitor Store Page for that Store.


Store Web Monitor Store Page  <-

Here is an image of the Web Monitor's Store page:

webmon_store.png

This page shows the following information about the Store.

Item

Description

Interface

This Store is listening on all interfaces (0.0.0.0) on port 38401.

Cache Dir

Pathname for disk Store message cache directory. This would be configured as a Store option in the Store configuration file. For example:
<option type="store" name="disk-cache-directory" value="cache/" />

State Dir

Pathname for disk Store state directory. This would be configured as a Store option in the Store configuration file. For example:
<option type="store" name="disk-state-directory" value="state/" />

Configured Retransmission Request Processing Rate

Current value for the Store's retransmission-request-processing-rate option setting.

Total Seconds Used for Rate Calculations

Accumulating counter that displays the number of seconds since the last rate reset. The Web Monitor divides the Retransmission Request Received, Retransmission Request Service and Retransmission Request Drop totals by the Total Seconds to calculate the rates displayed. If you click the Reset Rate Stats, the Web Monitor resets this value to zero.

Retransmission Request Received Rate

Number of retransmission requests received per second.

Retransmission Request Service Rate

Number of retransmission requests serviced per second.

Retransmission Request Drop Rate

Number of retransmission requests dropped per second. Requests are dropped if the rate of retransmission requests exceeds the configured retransmission request rate.

Retransmission Request Total Dropped

The number of retransmission requests since the time the Store was started.

Patterns

Specifies the wildcard pattern used to select topics for which a Store will provide persistence services. This would be configured as a topic option in the Store configuration file. For example: <topic pattern="test.*" type="PCRE">

Topics

Displays the topic names and Registration ID (Session ID) for any sources publishing on the topic. The screen examples display one topic, test1 - 2504558780(39307788). Each Registration ID (Session ID) is a clickable link, which displays the Store Web Monitor Source Page for that source.

Reset Rate Stats Click the Reset Rate Stats link to reset the retransmission rates. After clicking the link, The Web Monitor rests Total Seconds Used for Rate Calculations to zero and displays a page with the Store number and the message, 'Rate Statistics have been reset'.


Store Web Monitor Source Page  <-

Here is an image of the Web Monitor's Source page:

webmon_source.png

The first line in the page contains is interpreted as follows:

2504558780

The source's registration ID.

10.29.3.42.14392

The IP address and port of the source's LBM configuration option, request_tcp_port (context).

3958260924

The source's transport session index.

1161732811 The source's topic index within the transport session, 3958260924.

The remaining fields are described in the following table:

Source Page Item

Description

Topic

test is the source's topic string.

Session ID

39307788 is the source's Session ID.

Last Activity

09:19:39.501350 is the timestamp when the Store last heard from the source, including keepalives sent by UM

Repository

disk is the type of repository. Possible values are "memory" or "disk".

Receiver Paced Persistence

Setting for Receiver-paced Persistence (RPP), which is a repository option both the repository and source must enable. A value of 0 means RPP is not enabled and the repository is using the default Source-paced persistence. A value of 1 means RPP is enabled.

Message Map: 3120

The total number of message fragments the Store has for this source, both on disk and in memory. These are UM-level fragments, not IP-level fragments. UM messages are fragmented into roughly 8 kilobyte chunks for UDP-based protocols (LBT-RM and LBT-RU) and into roughly 64 kilobyte chunks for LBT-TCP. The majority of application messages tend to be well under the fragment boundaries, so the value after "Message Map" could be used as a rough estimate of the number of messages in the Store from this particular source. It's at least a strict upper bound.

Window: [0, 9d5, c2f]

Window format is: trail_sqn, mem_trail_sqn, lead_sqn

  • trail_sqn, 0, is the trailing sequence number, which is the oldest sequence number in the Store for this source. In most cases, this starts at 0 and stays there for a while. The trailing sequence number changes if the Store reaches a disk file size limit and then deletes the oldest messages.

  • mem_trail_sqn, 9d5, is the trailing sequence number for messages in memory. It is the oldest sequence number still in memory. Typically, you might have more sequence numbers on disk than you do in memory, or possibly the same number.

  • lead_sqn, c2f, is the leading sequence number, which is the newest sequence number in the Store.

    Note: For a memory Store, the first and second values would always be the same. The oldest sequence number in memory is the oldest in the Store, so only two values are displayed. The trailing sequence number and the leading sequence number.

Memory: 55986 / 65000 / 50331648

Memory format is: repository memory size / repository size threshold / repository size limit

  • repository memory size, 55986, is the number of bytes of messages in memory, which includes headers and Store overhead.

  • repository size threshold, 65000, is the repository-size-threshold topic option found in the Store configuration file.

  • repository size limit, 50331648, is the Store's repository-size-limit topic option found in the Store configuration file.

    You would expect the number of bytes in memory to be under the threshold most of the time, but it could spike above it before going back down if the Store is really busy momentarily. It should never go above the limit.

Age Threshold: 0

Age Threshold, 0, is the Store's repository-age-threshold topic option found in the Store configuration file.

Sync: [c2f, c2f, c2f]

Pertains to disk repositories only. Sync format is: sync_complete_sqn, sync_sqn, contig_sqn

  • sync_complete_sqn, c2f, Most recent sequence number that the Operating System has confirmed persisting to disk.

  • sync_sqn, c2f, Most recent sequence number for which the Store has initiated persisting to disk, but the Operating System has not confirmed completion of persistence.

  • contig_sqn, c2f, Most recent sequence number that along with the trail_sqn, creates a range of sequence numbers with no sequence number gaps. For example, if trail_sqn = 0 and the Store has persisted all eleven messages with sequence numbers 0 through 10, contig_sqn would equal 10. contig_sqn would also be 10 if a receiver declared message sequence number 7 unrecoverably lost. contig_sqn would be 6 if message sequence number 7 was not persisted, but not declared lost.

In progress: 0 / 0

Pertains to disk repositories only. In progress format is: num_ios_pending / num_read_ios_pending

  • num_ios_pending, 0, Number of disk writes the Store has submitted to the Operation System. A disk write refers to the Store persisting a message to disk.

  • num_read_ios_pending, 0, Number of disk reads that the Store has submitted to the Operating System. A disk read, for example, results from an application retransmission request.

Offsets: 0 / 190320 / 4294967296

Pertains to disk repositories only. Offsets format is: start_offset, offset, max_offset

  • start_offset, 0, The relative location of the first message, trail_sqn, in the disk.

  • offset, 190320, The relative location of where the message, contig_sqn plus one will be written.

  • max_offset, 4294967296, The maximum size of the cache file.

Active ULBs: 0 high 0

ULB stands for Unrecoverable Loss Burst. A little extra work is required to keep cache files consistent when the Store gets an unrecoverable loss burst, because unrecoverable loss bursts are delivered all at once for lots of messages, rather than one at a time like normal unrecoverable loss messages.

Active ULB is the number of unrecoverable loss burst events the Store is dealing with at the moment. It'll go to zero after the ULB has been resolved.

The high number (0) is the highest sequence number reported among any unrecoverable loss burst event, and is not reset after the ULB is handled; it increments throughout the process life of the Store.

WARNING: If you see any number other than 0 here, the Store is losing large numbers of messages, and they are likely not being persisted.

Loss: 0 ULBs 0

These values are counters for number of unrecoverable loss messages (Loss) and for number of unrecoverable burst loss messages (ULB). These start at 0 when the Store starts up and aren't reset until the Store exits. They don't include any loss events that were persisted to disk from a previous run, only new loss events since the Store started. There are cases with UME 2.0 where one individual Store could legitimately report some unrecoverable loss, or maybe even unrecoverable loss bursts.

WARNING: If you see any number other than 0 for either of these counters, you should investigate.

Drops: 0 / 0

If the Store is nearing the repository-size-limit and gets another message, the Store will intentionally drop a message. A drop requires a bit of work on the Store's part.

The first 0 is the number of active drops, which are drops that are currently being worked on.

The second 0 is the total number of drops that have happened for this Store since it was started. Some people want a low repository-size-limit and therefore lots of intentional drops can occur. Some don't want to drop any message the whole day - so the interpretation of the values is up to you.

LBM Stats

These represent transport-level statistics for the underlying receivers in the Store for the source. The example shown is for a TCP source, so not too many stats are available (stats for a TCP source are less important from a monitoring perspective).

Statistics for an LBT-RM or LBT-RU source, however, show number of NAKs sent, which is important. Ideally, the number of NAKs sent should be 0. A few NAKs from a Store throughout the day is not an emergency. It can be, however, an early warning sign of more severe problems, and should be taken seriously.

If you see a non-zero number of NAKs here, take a look at the overall network load the Store's machine is attempting to handle, particularly in very busy periods and spikes; it may be too much.

Receivers Registration IDs and accompanying Session ID for the receivers listening on the source's topic. Click on the receiver Registration ID (Session ID) to display the Store Web Monitor Receiver Page to review information about the receivers for that persisted topic.


Store Web Monitor Receiver Page  <-

Here is an image of the Web Monitor's Receiver page:

webmon_receiver.png

The first line in the page contains is interpreted as follows:

2504558781

The receiver's registration ID.

10.29.3.42.14393

The IP address and port of the source's LBM configuration option, request_tcp_port (context).

1510613393

The receiver's transport session index.

1161732811 The source's topic index within the transport session, 1510613393.

The remaining fields are described in the following table:

Receiver Page Item

Description

Topic

The topic that the receiver is listening on.

Last Activity

09:09:35.981110 is the timestamp of when the Store last heard from the receiver, including keepalives sent by UM.

Source RegID

Registration ID of the source publishing on the topic. Click on the Registration ID link to display the Store Web Monitor Source Page.

Source Session ID

The Session ID of the Source sending messages on the topic.

ACK c93 is the last message sequence number the receiver acknowledged.