Guide for Persistence
|
Sources, receivers, and stores interact in very controlled ways. This section illustrates the flow of network traffic between the components during three modes of operation and also provides a reference of persistence events.
UM sources heavily influence the persistence registration process. Sources send out registration information to enable receivers to register with stores and also monitor store liveness. If stores become unresponsive, or if communication among sources, stores and receivers becomes impaired, the source directs re-registration.
The following outlines the major events in the source registration process with the store:
The following diagram illustrates network flow during the registration process.
Sources can find the correct store(s) to register with from the values configured for it in ume_store (source) or ume_store_name (source). The configuration option ume_store (source) contains the IP address, TCP port, registration ID, and group index for the store(s) to be used by the source. The configuration option ume_store_name (source) contains the names of the stores to be used by the source. ume_store_name (source) requires that the store name is configured with the context-name option in the store's XML configuration file. See Identifying Persistent Stores and the <store>.
Sources unicast registrations to the store. The store unicasts responses back to the source. Registrations are on a per topic per source basis. Stores use RegIDs to identify sources and receivers. After registration sources may send data.
After the source successfully registers with all the stores for which it is configured, the source issues a Registration Complete event and sends a Source Registration Information (SRI) record over the configured UM transport session.
For multiple stores, the source determines when to issue a Registration Complete event based on the settings for the ume_retention_intragroup_stability_behavior (source) and ume_retention_intergroup_stability_behavior (source) options.
The source sends the SRI at the rate set by ume_sri_inter_sri_interval (source) until it reaches the maximum number of SRIs set by ume_sri_max_number_of_sri_per_update (source).
An SRI is a control message sent over the UM transport by a source that contains store information that a receiver needs to register with the store.
An SRI contains the following store information.
The SRI contains one overall version number and a separate version number for each store. If stores become unresponsive and the source must re-register when the store returns, the source increases the SRI version number and the version numbers for the stores it re-registered with. The highest SRI version number indicates the most current registration information. If a receiver gets an SRI with a higher version number than the version number it has, the receiver examines the individual store version numbers and re-registers with the those stores that have higher individual version numbers.
Receivers register with a store or stores after receiving a SRI packet from the source sending on the receiver's topic.
Receiver must receive an SRI before they can register with the store or stores. The following lists the major events in the receiver registration process.
The following diagram illustrates network flow during the registration process.
Any receivers who have resolved their topic and joined the transport session when the source sends out SRIs can register with the store. Any receivers joining the transport session when the source is not sending SRIs can request an SRI from the source if they find that the persistence flag is set in the source's TIR during topic resolution. The source responds with a SRI record.
Receivers unicast registrations to the store. The store unicasts responses back to the receivers. Stores use RegIDs to identify sources and receivers. After registration, receivers may handle recovery and send acknowledgements.
Note: If a persistent receiver's initial registration fails, it does not become an Ultra Messaging receiver.
The following diagram illustrates the normal operation of data reception and acknowledgement and also shows how UM attains Parallel Persistence. The source sends message data to receivers and stores in parallel.
During normal persistence operation:
Normal operation and recovery can proceed at the same time. In addition, as a receiver consumes retransmitted messages, the receiver sends normal acknowledgements for consumption and confirmed delivery (if requested by the source).
UM supports a flight size mechanism that tracks messages in flight from a persistent source and responds when a send would exceed the configured flight size (ume_flight_size (source) and/or ume_flight_size_bytes (source)). You can configure ume_flight_size_behavior (source) to either:
UM considers a sent message in flight until the following two conditions are met:
If configuring both ume_flight_size (source) and ume_flight_size_behavior (source), UM uses the smaller of the two flight sizes on a per send basis.
ume_flight_size (source) | ume_flight_size_bytes (source) | Result |
---|---|---|
Exceeded | Exceeded | ume_flight_size_behavior (source) executes |
Exceeded | Not Exceeded | ume_flight_size_behavior (source) executes |
Not Exceeded | Exceeded | ume_flight_size_behavior (source) executes |
Not Exceeded | Not Exceeded | No flight size sending restriction |
When using stores in a Quorum/Consensus configuration, intragroup and intergroup stability settings affect whether UM considers a messages in flight. Consider a case with three stores in a single QC group, and two receivers. Given the default configuration, until a source receives a stability notification from two of the three stores, UM considers a given message in-flight. In addition, if you set ume_retention_unique_confirmations (source) to 2, that same message would be considered in flight until the source receives two stability notifications AND two delivery confirmation notifications. See also Sources Using Quorum/Consensus Store Configuration.
Blocking Message Sends That Exceed the Flight Size
By default, when a source sends a message that exceeds it's flight size, the call to send blocks. For example, suppose the flight size is set to 1. The first send completes but before the source receives a stability notification or delivery confirmation, it initiates a second call to send. If the source uses a blocking send, the send call blocks until the first message stabilizes. If the source uses a non-blocking send, the send returns an LBM_EWOULD_BLOCK.
Notification of Message Sends That Exceed the Flight Size
Alternatively, ume_flight_size_behavior (source) can be set to notify your application when a message send surpasses the flight size. A send that exceeds the configured flight size succeeds and also triggers a flight size notification, indicating that the flight size has been surpassed. Once the number of in-flight messages falls below the configured flight size, another flight size notification source event is triggered, this time, informing the application that the number of in-flight messages is below the source's flight size.
Normal loss retransmission over the UM transport operates identically in persistence as it does in streaming, according to the transport protocol. Stores do not participate in this transport-level loss retransmissions.
Persistent stores become involved in message recovery in circumstances where the transport protocol is not able to recover. For example, if an application exits (either intentionally or by failure) and then restarts some time later, the transport is not able to recover messages that were sent during the application's down time. When the receiver restarts and re-registers, the receiver discovers the lowest message sequence number it did not receive, and subsequently requests retransmissions of all messages not received, starting from this low sequence number.
For more on this process see, Persistent Receiver Recovery.
Another circumstance in which the store becomes involved in message recovery is if the transport protocol tries but is unable to recover lost messages. In this case, Off Transport Recovery (OTR) is used. Note that OTR is available in streaming, and is serviced by the source's retention buffer. But for persistent sources, the store services OTR. See Off-Transport Recovery (OTR) for more information.
For more reliable persistence operation, Informatica recommends enabling OTR, especially when using UM Routers.
The following diagram illustrates receiver recovery:
Receivers unicast retransmission requests. If the store has the message, it unicasts the retransmission to the receiver. If it does not have the message and is configured to forward the request to the source, it unicasts the retransmission request to the source. If the source has the message, it unicasts the retransmission directly to the receiver. See also Message Loss Recovery.
UM store sends retransmissions from a thread separate from the main context thread so as not to impede live message data processing. The '<store>
' configuration option, retransmission-request-processing-rate, sets the store's capacity to process retransmission requests. The retransmission thread processes requests off a retransmission queue which is set at 4 times the size of retransmission-request-processing-rate. The following UM Web Monitor statistics indicate retransmission activity (see Store Web Monitor):
The Receiver-paced Persistence mode of operation is primarily intended to prevent message loss to critical receivers, even if loss prevention requires blocking sources from sending. To achieve this, message retention in the store is different from Source-paced persistence:
In Source-paced Persistence (SPP), messages are retained in the store until the space is needed for new messages. I.e. the message repository is a circular buffer which will overwrite when it "wraps". If a slow or stopped receiver falls behind the source by more than the size of the store's repository, that receiver will experience unrecoverable loss.
Source pacing is typically chosen for applications where outgoing messages are generated by external events or processes that cannot be slowed down or stopped (e.g. market data). Receiver pacing is typically chosen for applications which are able to slow down or even halt the generation of messages (e.g. a user interface which can inhibit user entry).
RPP is enabled with UM configuration options. No special API calls are needed.
RPP differentiates between two types of receivers:
Each receiver indicates its desired blocking behavior with the ume_receiver_paced_persistence (receiver) configuration option. Both blocking and non-blocking receivers may register with the same store and subscribe to the same source.
Here are important points when using RPP:
The repository must be configured to allow RPP, and sources and receivers must be configured to request RPP behavior during registration. Assuming the store is configured to allow RPP, the source determines the pacing behavior (receiver v.s. source) when it registers. If a receiver requests a different behavior, its registration will fail.
The store tracks the number of registered blocking and non-blocking receivers for each message sent by the source. A message is normally retained in the store repository until that number of receivers have acknowledged consumption. Once all receivers acknowledge consumption of a message, that message is removed from the repository.
Sources can modify specific repository configuration options that pertain to RPP.
Due to RPP's message retention policies, late joining RPP receivers cannot recover previously sent messages.
With RPP, sources are required to configure their flight size in bytes, in addition to message count. (With SPP, only message count flight size is required.) The value set for the source's ume_flight_size_bytes (source) configuration option is checked against a maximum allowed value specified in the store's XML configuration file.
In addition, a disk write delay interval for the repository, improves performance by preventing unnecessary disk activity.
RPP introduces the capability of a source application to set the following operational options on the store:
With SPP, those parameters are set only by the store's XML configuration file alone. With RPP, the source's configuration can optionally request a different value for those operating parameters, with the store's configured value being used as a maximum allowed threshold.
A source configures its desired pacing behavior (source paced v.s. receiver paced) with ume_receiver_paced_persistence (source) and ume_receiver_paced_persistence (receiver). If set to 1, it becomes an RPP source. Assuming the store is configured to allow RPP, when an RPP source registers with the store, the store's repository for that source becomes an RPP repository. The receiver configures its desired pacing behavior with ume_receiver_paced_persistence (receiver), where 0 is source-paced and 1 or 2 are receiver-paced. The receiver's pacing must match that of the source and store, otherwise the receiver's registration will fail. In addition, the choice of 1 or 2 determines the receiver's desired blocking behavior (1=blocking, 2=non-blocking).
Note that although the configured pacing behavior must match between source and receiver, that does not mean that the numerical setting of the ume_receiver_paced_persistence (source) and ume_receiver_paced_persistence (receiver) options must be equal. If the source is 0 (source paced), then the receiver must also be 0. However, if the source is 1 (receiver paced), then the receiver must be either 1 or 2, depending on the receiver's desired blocking behavior.
As with Source-paced Persistence, RPP sources send Source Registration Information (SRI) packets to RPP receivers over the configured UM transport. RPP Receivers must wait for this information before they can initiate registration requests to the store. See Source Registration and Receiver Registration for more information.
A source registration request includes the following:
A receiver registration request includes its designation as a RPP receiver.
The repository's registration response to both a source and a receiver acknowledges RPP mode.
Late Registering Receiver
A late joining receiver that registers after the first RPP topic message has been sent cannot recover any messages sent prior to its initial registration. It is the user's responsibility to synchronize a receiver's initial registration with the start of message transmission. This restriction does not apply to an RPP receiver that initially registered at an earlier time and is now re-registering, as after a failure and restart. In that case, messages that were sent after the receiver's initial registration will be retained by the store for recovery by the receiver.
Early Exiting Receiver
Each registered receiver has associated with it an activity timeout and a state lifetime. During normal operation, the store monitors the operation of a registered receiver. If the store hears nothing from a receiver for the duration of the activity timeout, the store assumes that the receiver has halted operation. Messages will be retained by the store according to the receiver's configured blocking behavior. This gives the receiver time to restart and re-register. If an inactive receiver re-registers before the state lifetime expires, the receiver will be able to recover all messages that it missed.
However, if a receiver remains halted for the duration of the state lifetime, the store will delete the receiver state information. If the repository is retaining messages for this receiver, those messages will be implicitly acknowledged on behalf of the expired receiver, making them eligible for deletion if no other receivers' acknowledgements are pending. If the source is blocked waiting for this receiver, the store will unblock the source. Finally, if the halted receiver re-register after its state lifetime has expired, the store will treat it as an initial registration, and the messages it missed will not be available.
UM Version RPP Compatibility Matrix
The following table indicates the result of registration requests across UM versions:
Version/Object | Pre-ver. 5.3 Store | Ver. 5.3 RPP Store | Ver. 5.3 Non-RPP Store |
---|---|---|---|
Pre 5.3 Source | Granted | Rejected * | Granted * |
5.3 RPP Source | Granted - Source Error | Granted * | Rejected * |
5.3 Non-RPP Source | Granted | Rejected * | Granted * |
Pre 5.3 Receiver | Granted | Rejected | Granted |
5.3 RPP Receiver | Granted - Receiver Error | Granted | Rejected |
5.3 Non-RPP Receiver | Granted | Rejected | Granted |
Where:
At a high level, the normal sequence of operations for RPP is the same as it is for SPP:
Sources transmit messages to receivers and stores at the same time over UM transports. Sources also track stability acknowledgements from the store. A source is allowed to send messages ahead of stability acknowledgements up to the configured flight size. If the flight size of unstabilized messages is reached, the source is blocked from sending more messages pending stability acknowledgements from the store.
Receivers acknowledge consumption of received messages back to stores, and optionally to the sources.
One important way that RPP differs from SPP is in the sending of stability acknowledgements. With SPP, the store normally waits to send a stability acknowledgement until a message is "stable" on the configured storage medium, either disk or memory. With RPP, the sending of stability acknowledgements is affected by receiver consumption acknowledgements in two ways:
If a message is acknowledged by all registered receivers before the message is written to disk, then there is no need to retain the message at all. The message is deleted and a stability acknowledgement is sent to the source.
The following also affect stability acknowledgements:
Acknowledge on Reception - If the source is configured for ume_repository_ack_on_reception (source) and the store is configured for repository-allow-ack-on-reception, the store sends a stability acknowledgement to the source immediately upon reception of a message, even before any receiver acknowledgements are received, and before the message is written to disk. This setting can increase system throughput for some use cases, but also increases the risk of message loss in the event of a store failure.
For memory store repositories, the options ume_repository_ack_on_reception (source) and repository-disk-write-delay have no effect.
The normal way that RPP receivers recover messages is when they re-register within the state lifetime after a failure. However, just as with SPP, there is the possibility that the transport session of the source is unable to successfully deliver all messages to the receiver. In the event of unrecoverable loss at the transport session, the Off Transport Recovery (OTR) method is also active for RPP receivers. OTR does not require the receiver to restart to recover messages from the store. See the Off-Transport Recovery (OTR) for more information.
You can deregister either sources or receivers using deregistration APIs, (lbm_src_ume_deregister(), lbm_rcv_ume_deregister(), and lbm_wrcv_ume_deregister()). UM deletes the state of deregistered objects. If you deregister an RPP receiver, UM automatically decrements the number of receiver acknowledgements required to maintain RPP behavior. The store issues Deregistration Successful events for every source or receiver that deregisters. Note that after deregistering a source or receiver, the object will still exist, but is no longer participating in persistence. An attempt to send to a deregistered source will return an error. A deregistered receiver will continue to deliver messages on the topic, but since it is no longer participating in persistence, it will be unable to acknowledge those messages. If the application wants to re-join persistence, it must delete the source or receiver and re-create it, allowing it to re-register. See Persistence Events.
Users should be cautious using the deregistration APIs, especially for sources. Source deregistration will immediately delete from the store any messages from that source which might be retained due to lack of receiver acknowledgement. This deletion will render the receivers unable to recover those messages.
Follow the procedure below to configure Receiver-paced Persistence:
Set ume_receiver_paced_persistence (source) and ume_receiver_paced_persistence (receiver) in the UM configurations. If only certain sources or receivers in a context are RPP, use lbm_*setopt() in the source or receiver application or use UM XML configuration files.
Set repository-allow-receiver-paced-persistence = 1 for the repository in the umestored XML configuration file.
Coordinate ume_flight_size_bytes (source) between the repository and the source. Set the maximum allowable flight size with the repository option, source-flight-size-bytes-maximum. Sources can reconfigure its flight size bytes to a value less than or equal to the maximum.
Optional: coordinate the ume_repository_ack_on_reception (source) between the repository and the source. If the repository has repository-allow-ack-on-reception enabled (1), the source can choose to keep it enabled or turn it off. If the repository has repository-allow-ack-on-reception disabled (0), the source cannot turn it on.
Optional: if the repository is a disk repository (repository-type = disk or reduced-fd), set the maximum write delay with the repository option, repository-disk-write-delay. Sources can set ume_write_delay (source) to a value less than or equal to repository-disk-write-delay.
Optional: coordinate repository size options between the source and repository. If you wish to use the repository's values, you do not need to configure source configuration values. The repository sets a maximum for these three options. The source can reconfigure the repository's options with values less than or equal to the maximum configured for the repository using the following UM configuration options:
The sample configuration files shown below show how a store configuration file establishes certain RPP option values and the source can reconfigure them via a UM configuration file. Although only two files appear below, this configuration represents two, single-store quorum/consensus groups and one UM context. A second umestored configuration file would be required for the store store1rpp containing options and values identical to store0rpp.
UM Configuration File for RPP
The following example UM configuration file will work for applications which have sources and/or receivers that must be persisted using RPP. This configuration file is written assuming that the store is configured as shown in the next section.
The source configures ume_flight_size_bytes (source) to 1,000,000 bytes. For this to work, the repository must set source-flight-size-bytes-maximum to a value greater than or equal to 1,000,000.
The source uses ume_write_delay (source) to override the repository's repository-disk-write-delay setting to 1000 ms (1 second). Note that for this to work, the repository must set repository-disk-write-delay to a value greater than or equal to 1000 ms.
umestored Configuration File
In the following example store configuration file, RPP options appear in the section for the topic pattern, ABC*. This configuration file is written assuming client applications (sources and receivers) use UM configuration files similar to that shown in the preceding section.
There are actually three stores configured in Q/C. The other two's configurations should differ appropriately. For example, change each instance of "store0" to "store1" and "store2" respectively.
UM Feature | Supported | Notes |
---|---|---|
Store Proxy Sources | Yes | |
UM Router | Yes | |
UM Transports | Yes | |
Multi-Transport Threads | No | The Multi-Transport Threads does not support persistence. |
Off-Transport Recovery | Yes | |
Late Join | No | A receiver cannot recover messages sent prior to that receiver's initial registration. |
HF | Yes | |
HFX | Yes | |
Wildcard Receivers | Yes | |
Message Batching | Yes | |
Ordered Delivery | Yes | |
Request/Response | Yes | |
Multicast Immediate Messaging (MIM) | No | MIM messages are not persisted and have no impact on RPP. |
Source Side Filtering | Yes | |
Self-Describing Messaging (SDM) | Yes | |
Pre-Defined Messaging (PDM) | Yes | |
UM Spectrum | Yes | |
Monitoring/Statistics | Yes | |
Acceleration - DBL | Yes | |
Acceleration - UD | Yes | |
Implicit/Explicit Acknowledgements | Yes | |
Registration ID/Session Management | Yes | |
Fault Tolerance - Quorum Consensus | Yes | |
UM SNMP Agent | Yes | |
Ultra Messaging Manager | Yes | |
Ultra Messaging Cache | Yes | |
Ultra Messaging Desktop Services | No |
The Ultra Messaging API provides a number of events, callbacks, messages, functions, and settings. The API reference (C API, Java API or .NET API) can be used to see the true extent of the API. In order to design successful applications, though, a high level understanding of the events and callbacks is essential.
Some specific languages, such as C, Java, or C# may have specific nuances for the various events and callbacks. But, by and large, an application should plan on having access to the items listed in the following sections. For details for a particular language, consult the Ultra Messaging API documentation (C API, Java API or .NET API).
The following events and callbacks are available for source applications:
Event Name | Type | Description |
---|---|---|
Store Registration Success | Source Event | Delivered once a source has successfully registered with a single store. Event contains flags to show if the source is "old" (i.e. a re-registration) as well as the sequence number that the source should use as its initial sequence number when sending, and the store information |
Store Registration Complete | Source Event | Delivered once a source has completed registration with the required store(s). This indicates the source may send as it desires. Event contains the consensus sequence number. |
Store Registration Error | Source Event | Delivered once a source has received an error from the store indicating the requested registration was not granted. Event contains an error message to indicate what happened. |
Store Message Stable | Source Event | Delivered once a message is stable at a single store. Event contains the message sequence number and indicates if the message meets Intergroup and/or Intragroup stability requirements. Also includes the store information. |
Store Message Not Stable | Source Event | Delivered once a message's ume_message_stability_lifetime (source) has expired. The source no longer retransmits the message to the store. |
Delivery Confirmation | Source Event | Delivered once a message has been confirmed as delivered and processed by a receiving application. Event contains the message sequence number as well as indications whether the message has met the unique confirmations requirement. Also contains the receiver's Registration ID or Session ID. |
Store Unresponsive | Source Event | Delivered once a store is seen to be unresponsive due to failure or network disconnect. Event contains a message with more details suitable for logging. If a majority of a source's configured stores are unresponsive, the application will not be allowed to send messages. |
Store Message Reclaimed | Source Event | Delivered once a message has passed through retention and is about to be released from memory or disk. Event contains the message sequence number. (Reclaim refers to storage space reclamation.) |
Store Forced Reclaim | Callback | Indicates a message is being forcibly released because the memory size limit (retransmit_retention_size_limit (source)) has been exceeded or the message's ume_message_stability_lifetime (source) has expired. Event contains the message sequence number. |
Flight Size Notification | Callback | Indicates that the number of in-flight messages for a source has exceeded or fallen below the configured flight size limit for a source. The event indicates if the flight size has been exceeded (OVER) by a new message send or that a message recently stabilized has reduced the number of in flight messages to less than the flight size limit (UNDER). |
RPP Source Registration Success | Source Event | Delivered once a source has successfully registered with a single store as a RPP source. The event contains either the RegID or Session ID, the sequence number of the last message stored for the source and store information. |
RPP Source Registration Failure | Source Event | Delivered once a source has received an error from the store indicating the requested registration was not granted. Event contains an error message to indicate what happened. |
RPP Source Deregistration Success | Source Event | Delivered once a source successfully deregisters from an individual store. The event contains either the RegID or Session ID, the sequence number of the last message stored for the source and store information. |
RPP Source Deregistration Complete | Source Event | Delivered once UM receives a successful deregistration event from all stores. |
The following callbacks and messages are available for receiver applications:
Event Name | Type | Description |
---|---|---|
Store Registration Success | Message | Delivered once a receiver has successfully registered with a single store. Message contains flags to show if the receiver is "old" (i.e. Not a new registration) as well as the sequence number that the receiver should use as its low sequence number, and the store information. In addition, the event contains the source's Registration ID or Session ID and the receiver's Registration ID or Session ID. |
Store Registration Complete | Message | Delivered once a receiver has completed registration with the store(s) required. This indicates the receiver may now receive data. Message contains the consensus sequence number. |
RPP Receiver Registration Success | Message | Delivered once a receiver has successfully registered with a single store as a RPP receiver. Message contains either the RegID or Session ID, the sequence number of the last message stored for the source and store information. |
RPP Receiver Registration Failure | Message | Delivered once a receiver has received an error from the store indicating the requested registration was not granted. Event contains an error message to indicate what happened. |
RPP Receiver Deregistration Success | Message | Delivered once a receiver successfully deregisters from an individual store. The message contains either the RegID or Session ID for the receiver and the source, the sequence number of the last message stored for the source and store information. |
RPP Receiver Deregistration Complete | Message | Delivered once UM receives a successful deregistration event from all stores. |
Store Registration Error | Message | Delivered once a receiver has received an error from the store indicating the requested registration was not granted. Message contains an error message to indicate what happened. |
Store Registration Change | Message | Delivered once a change in store information is received from the source. The extent of the change is included in a message suitable for logging. |
Store Retransmission | Message | Retransmissions from recovery come in as normal messages with a flag indicating their status as a retransmission. |
Store Registration Function | Callback | Called once a receiver receives store information from a source and UM desires to know the RegID to use for the receiver. Callback passes the source RegID, the store information, and the source transport name. The return value is the RegID that UM should request to use from the store. |
Store Recovery Sequence Number Function | Callback | Called once registration is about to complete and the low sequence number must be determined. Callback passes the highest sequence number seen from the source and the consensus sequence number from the stores. |
The following events are available for the context of source and receiver applications.
Event Name | Type | Description |
Flight Size Notification | Context Event | Indicates that the number of in-flight Multicast Immediate Messages has exceeded or fallen below the configured flight size limit. The event indicates if the flight size has been exceeded (OVER) by a new message send or that a message recently stabilized has reduced the number of in flight messages to less than the flight size limit (UNDER). |