Late Join allows sources to save a predefined amount of their messaging traffic for late-joining receivers. Sources set the configuration options that determine whether they use Late Join or not, and receivers set options that determine whether they will participate in Late Join recovery if sources use Late Join.
UMP's persistent store is built on Late Join technology. In the Estimating Recovery Time discussion below, the terms "Late Join buffers" and "UMP store" are roughly equivalent.
For more, review the Late Join section in the Concepts Guide, especially Configuring Late Join for Large Numbers of Messages.
To estimate Late Join recovery time R in minutes, use the formula: R = D / ( 1 - ( txrate / rxrate ) )
where:
D
is the downtime (in minutes) across all receivers
txrate
is the average transmission rate of normal
messages from sources during recovery (in kmsgs/sec)
rxrate
is the average recovery rate from source-side
Late Join buffers during recovery (in kmsgs/sec)
For example, consider the following scenario:
D
= 10 minutes
txrate
= 10k messages / second
rxrate
= 25k messages / second
Plugging these values into the formula gives an estimated recovery time in minutes:
R = 10 / ( 1 - ( 10 / 25 ) )
or 16.67 minutes. You can use
this estimated recovery time to set Late Join option retransmit_request_generation_interval. Set it at least as high as the
longest expected recovery time (don't forget to convert to milliseconds). Note that if
this interval is too short, you may experience burst loss during recovery.
Note that this formula assumes the following:
Recovery rate is as linear as possible with use of option response_tcp_nodelay 1
Transmit rate (txrate)
from *all* relevant sources is
fairly constant and equal
Recovery rate (rxrate)
from Late Join buffers is fairly
constant and equal, and should be measured in a live test, if possible. You can adjust
the recovery rate with two Late Join configuration options:
Configure the source to enable both Late Join and Off-Transport Recovery (OTR) operation for receivers.
When a late-joining receiver detects (from the topic advertisement) that a source is enabled for Late Join but has sent no messages, this flag option lets the receiver request an initial sequence number from a source. Sources respond with a TSNI.
This option enables receiver caching of new messages during a recovery. The option value determines how close or proximate the current new sequence number must be to the latest retransmitted sequence number for the receiver to start caching. The receiver recovers uncached data later in the recovery process by the retransmit request mechanism. An option value greater than or equal to the default turns on caching of new data immediately. A smaller value means that caching does not begin until recovery has caught up somewhat with the source. A larger value means that caching can begin earlier during recovery. This value has meaning for only receivers using ordered delivery of data. See Configuring Late Join for Large Numbers of Messages for additional information about this option.
The maximum interval between when a receiver first sends a retransmission request and when the receiver stops and reports loss on the remaining RXs not received. See Configuring Late Join for Large Numbers of Messages for additional information about this option.
The interval between retransmission request messages to the source. See Configuring Late Join for Large Numbers of Messages for additional information about this option.
The maximum number of messages to request, counting backward from the current latest message, when late-joining a topic. Due to network timing factors, UM may transmit an additional message. For example, a value of 5 sends 5 or possibly 6 retransmit messages to the new receiver. (Hence, you cannot request and be guaranteed to receive only 1 last message--you may get 2.) A value of 0 indicates no maximum.
The maximum number of messages to request at a single time from a persistent store or a source. A value of 0 indicates no maximum. See Configuring Late Join for Large Numbers of Messages for additional information about this option.
Specifies the minimum age of messages in the retained message buffer before UM can delete them. UM cannot delete any messages younger than this value. For UMS Late Joins, this and retransmit_retention_size_threshold are the only options that affect the retention buffer size. For UMP, these two options combined with retransmit_retention_size_limit affect the retention buffer size. UM deletes a message when it meets all configured threshold criteria, i.e., the message is older than this option (if set), and the size of the retention buffer exceeds the retransmit_retention_size_threshold (if set). A value of 0 sets the age threshold to be always triggered, in which case deletion is determined by other threshold criteria.
Sets a maximum limit on the size of the source's retransmit retention buffer when using a UMP store. With UMP, stability and delivery confirmation events can delay the deletion of retained messages, which can increase the size of the buffer above the retransmit_retention_size_threshold. Hence, this option provides a hard size limit. UM sets a minimum value for this option of 8K for UDP and 64K for TCP, and issues a log warning if you set a value less than the minimum.
Specifies the minimum size of the retained message buffer before UM can delete messages. The buffer must reach this size before UM can delete any messages older than retransmit_retention_age_threshold. For UMP, these options combined with retransmit_retention_size_limit affect the retention buffer size. A value of 0 sets the size threshold to be always triggered, in which case deletion is determined by other threshold criteria.
Flag indicating if the receiver should participate in a late join operation or not.
Copyright (c) 2004 - 2014 Informatica Corporation. All rights reserved.