Concepts Guide
|
Except where otherwise indicated, the features described in this section are available in the UMS, UMP, and UMQ products.
As of UM version 6.11, a new receive-side object is available to the user: the Transport Services Provider Object.
The earlier feature, Multi-Transport Threads (MTT), is removed from UM in favor of XSP.
By default, a UM context combines all network data reception into a single context thread. This thread is responsible for reception and processing of application messages, topic resolution, and immediate message traffic (UIM and MIM). The context thread is also used for processing timers. This single-threaded model conserves CPU core resources, and can simplify application design. However, it can also introduce significant latency outliers (jitter) if a time-sensitive user message is waiting behind, say, a topic resolution message, or a timer callback.
Using an XSP object, an application can reassign the processing of a subscribed Transport Session to an independent thread. This allows concurrent processing of received messages with topic resolution and timers, and even allows different groups Transport Sessions to be processed concurrently with each other.
By default, when an XSP object is created, UM creates a new thread associated with the XSP. Alternatively, the XSP can be created with Sequential Mode, which gives the responsibility of thread creation to the application. Either way, the XSP uses its independent thread to read data from the sockets associated with one or more subscribed Transport Sessions. That thread then delivers received messages to the application via a normal receive application callback function.
Creation of an XSP does not by itself cause any receiver Transport Sessions to be assigned to it. Central to the use of XSPs is an application-supplied mapping callback function which tells UM which XSP to associate with subscribed Transport Sessions as they are discovered and joined. This callback allows the application to examine the newly-joined Transport Session, if desired. Then the callback returns, informing UM which XSP, if any, to assign the receiver Transport Session to.
Conceptually, an application designer might want to assign the reception and processing of received data to XSPs on a topic basis. This is not always possible. The XSP thread must process received data on a socket basis, and sockets map to Transport Sessions. As mentioned in UM Transports, a publishing application maps one or more topic-based sources to a Transport Session.
Consider the following example:
Publisher A and B are two separate application instances, both of which create a source for topic "X". A subscriber application might create two XSPs and assign one Transport Session to each. In this case, you have two independent threads delivering messages to the subscriber's receiver callback, which may not be what the developer wanted. If the developer wants topic X to be serialized, a single XSP should be created and mapped to both Transport Sessions:
Now let's introduce a second topic. The developer might want to create two XSPs so that each topic will be handled by an independent thread. However, this is not possible, given the way that the topics are mapped to Transport Sessions in the following example:
In this case, XSP 1 is delivering both topics X and Y from Publisher A, and XSP 2 is delivering topics X and Y from Publisher B. Once again, the receiver callback for topic X will be called by two independent threads, which is not desired.
The only way to achieve independent processing of topics is to design the publishers to map their topics to Transport Sessions carefully. For example:
When contexts are used single-threaded, the application programmer can assume serialization of event delivery to the application callbacks. This can greatly simplify the design of applications, at the cost of added latency outliers (jitter).
When XSPs are used to provide multi-threaded receivers, care must be taken in application design to account for potential concurrent calls to application callbacks. This is especially true if multiple subscribed Transport Sessions are assigned different XSPs, as demonstrated in XSP Handles Transport Sessions, Not Topics.
Even in the most simple case, where a single XSP is created and used for all subscribed Transport Sessions, there are still events generated by the main context thread which can be called concurrently with XSP callbacks. Reception of MIM or UIM messages, scheduled timers, and some topic resolution-related callbacks all come from the main context thread, and can all be invoked concurrently with XSP callbacks.
Threading Example: Message Timeout
Consider as an example a common timer use case: message timeout. Application A expects to receive messages for topic "X" every 5 seconds. If 10 seconds pass without a message, the application assumes that the publisher for "X" has exited, so it cleans up internal state and deletes the UM receiver object. Each time a message is received, the current timer is cancelled and re-created for 10 seconds.
Without XSPs, this can be easily coded since message reception and timer expiration events are serialized. The timer callback can clean up and delete the receiver, confident that no receiver events might get delivered while this is in progress.
However, if the Transport Session carrying topic "X" is assigned to an independent XSP thread, message reception and timer expiration events are no longer serialized. Publisher of "X" might send it's message on-time, but a temporary network outage could delay its delivery, introducing a race condition between message delivery and timer expiration. Consider the case where the timer expiration is a little ahead of the message callback. The timer callback might clean up application state which the message callback will attempt to use. This could lead to unexpected behavior, possibly including segmentation faults.
In this case, proper sequencing of operations is critical. The timer should delete the receiver first. While inside the receiver delete API, the XSP might deliver messages to the application. However, once the receiver delete API returns, it is guaranteed that the XSP is finished making receiver callbacks.
Note that in this example case, if the message receiver callback attempts to cancel the timer, the cancel API will return an error. This is because the timer has already expired and the execution of the callback has begun, and is inside the receiver delete API. The message receiver callback needs to be able to handle this sequence, presumably by not re-scheduling the timer.
This section provides simplified C code fragments that demonstrate some of the XSP-related API calls. For full examples of XSP usage, see Example lbmrcvxsp.c (for C) and Example lbmrcvxsp.java (for Java).
The common sequence of operations during application initialization is minimally shown below. In the code fragments below, error detection and handling are omitted for clarity.
Create a context attribute object and set the transport_mapping_function (context) option to point at the application's XSP mapping callback function using the structure lbm_transport_mapping_func_t.
Create the context.
Create XSPs using lbm_xsp_create(). In this example, only a single XSP is created.
Note that the application can optionally pass in a context attribute object and an XSP attribute object. The context attribute is because XSP is implemented as a sort of reduced-function sub-context, and so it is possible to modify context options for the XSP. However, this is rarely needed since the default action is for the XSP to inherit all the configuration of the main context.
Create a receiver for topic "X".
Event queues may also be used with XSP-assigned Transport Sessions.
At this point, when the main context discovers a source for topic "X", it will proceed to join the Transport Session. It will call the application's app_xsp_mapper_callback() function, which is minimally this:
This minimal callback simply returns the XSP that was created during initialization (the "clientd" can be helpful for that). By assigning all receiver Transport Sessions to the same XSP, you have effectively separated message processing from UM housekeeping tasks, like processing of topic resolution and timers. This can greatly reduce latency outliers (jitter).
As described in XSP Handles Transport Sessions, Not Topics, some users want to have multiple XSPs and assign the Transport Sessions to XSPs according to application logic. Note that the passed-in lbm_new_transport_info_t structure contains information about the Transport Session, such as the IP address of the sender. However, this structure does not contain topic information. Applications can use the resolver's source notification callback via the resolver_source_notification_function (context) option to associate topics with source strings.
As of UM 6.12, XSP supports persistent receivers.
When an XSP object is created, an XSP attribute object can be supplied to set XSP options. The XSP options are:
To create and manipulate an XSP attribute object, see:
To delete an XSP, all receivers associated with Transport Sessions handled by that XSP must first be deleted. Then the XSP can be deleted using lbm_xsp_delete().
XSP supports Registered File Descriptors.
There are some restrictions and limitations on the XSP feature.
The only transport types currently supported are LBT-RM, LBT-RU, and TCP. The XSP feature is not compatible with transport types IPC, SMX, or DBL.
An application receiver callback must not create a new XSP.
For a persistent receiver assigned to an XSP, ume_proactive_keepalive_interval (context) must be enabled.
XSP is not currently compatible with Ultra Load Balancing (ULB).
XSP is not currently compatible with Hot Failover (HF).
This section introduces the use of Ultra Messaging Late Join in default and specialized configurations. See Late Join Options for more information.
The Late Join feature enables newly created receivers to receive previously transmitted messages. Sources configured for Late Join maintain a retention buffer (not to be confused with a transport retransmission window), which holds transmitted messages for late-joining receivers.
A Late Join operation follows the following sequence:
The source's retention buffer's is not pre-allocated and occupies an increasing amount of memory as the source sends messages and adds them to the buffer. If a retention buffer grows to a size equal to the value of the source configuration option, retransmit_retention_size_threshold (source), the source deletes older messages as it adds new ones. The source configuration option retransmit_retention_age_threshold (source), controls message deletion based on message age.
UM uses control-structure overhead memory on a per-message basis for messages held in the retention buffer, in addition to the retention buffer's memory. Such memory usage can become significantly higher when retained messages are smaller in size, since more of them can then fit in the retention buffer.
With the UMP/UMQ products, late Join can be implemented in conjunction with the persistent Store, however in this configuration, it functions somewhat differently from Streaming. After a late-Join-enabled receiver has been created, resolved a topic, and become registered with a Store, it may then request older messages. The Store unicasts the retransmission messages. If the Store does not have these messages, it requests them of the source (assuming option retransmission-request-forwarding is enabled), thus initiating Late Join.
To implement Late Join with default options, set the Late Join configuration options to activate the feature on both a source and receiver in the following manner.
Create a configuration file with source and receiver Late Join activation options set to 1. For example, file cfg1.cfg containing the two lines:
source late_join 1 receiver use_late_join 1
Run an application that starts a Late-Join-enabled source. For example:
lbmsrc -c cfg1.cfg -P 1000 topicName
Wait a few seconds, then run an application that starts a Late-Join-enabled receiver. For example:
lbmrcv -c cfg1.cfg -v topicName
The output for each should closely resemble the following:
LBMSRC
$ lbmsrc -c cfg1.cfg -P 1000 topicName LOG Level 5: NOTICE: Source "topicName" has no retention settings (1 message retained max) Sending 10000000 messages of size 25 bytes to topic [topicName] Receiver connect [TCP:10.29.3.77:34200]
LBMRCV
$ lbmrcv -c cfg1.cfg -v topicName Immediate messaging target: TCP:10.29.3.77:4391 [topicName][TCP:10.29.3.76:4371][2]-RX-, 25 bytes 1.001 secs. 0.0009988 Kmsgs/sec. 0.1998 Kbps [topicName][TCP:10.29.3.76:4371][3], 25 bytes 1.002 secs. 0.0009982 Kmsgs/sec. 0.1996 Kbps [topicName][TCP:10.29.3.76:4371][4], 25 bytes 1.003 secs. 0.0009972 Kmsgs/sec. 0.1994 Kbps [topicName][TCP:10.29.3.76:4371][5], 25 bytes 1.003 secs. 0.0009972 Kmsgs/sec. 0.1994 Kbps ...
Note that the source only retained 1 Late Join message (due to default retention settings) and that this message appears as a retransmit (-RX-). Also note that it is possible to sometimes receive 2 RX messages in this scenario (see Retransmitting Only Recent Messages.)
To receive more than one or two Late Join messages, increase the source's retransmit_retention_size_threshold (source) from its default value of 0. Once the buffer exceeds this threshold, the source allows the next new message entering the retention buffer to bump out the oldest one. Note that this threshold's units are bytes (which includes a small overhead per message).
While the retention threshold endeavors to keep the buffer size close to its value, it does not set hard upper limit for retention buffer size. For this, the retransmit_retention_size_limit (source) configuration option (also in bytes) sets this boundary.
Follow the steps below to demonstrate how a source can retain about 50MB of messages, but no more than 60MB:
Create a second configuration file (cfg2.cfg) with the following options:
source late_join 1 source retransmit_retention_size_threshold 50000000 source retransmit_retention_size_limit 60000000 receiver use_late_join 1
lbmrcv -c cfg2.cfg -v topicName
. The output for each should closely resemble the following: LBMSRC
$ lbmsrc -c cfg2.cfg -P 1000 topicName Sending 10000000 messages of size 25 bytes to topic [topicName] Receiver connect [TCP:10.29.3.76:34444]
LBMRCV
$ lbmrcv -c cfg2.cfg -v topicName Immediate messaging target: TCP:10.29.3.76:4391 [topicName][TCP:10.29.3.77:4371][0]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][1]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][2]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][3]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][4]-RX-, 25 bytes 1.002 secs. 0.004991 Kmsgs/sec. 0.9981 Kbps [topicName][TCP:10.29.3.77:4371][5], 25 bytes 1.002 secs. 0.0009984 Kmsgs/sec. 0.1997 Kbps [topicName][TCP:10.29.3.77:4371][6], 25 bytes 1.002 secs. 0.0009983 Kmsgs/sec. 0.1997 Kbps [topicName][TCP:10.29.3.77:4371][7], 25 bytes ...
Note that lbmrcv received live messages with sequence numbers 7, 6, and 5, and RX messages going from 4 all the way back to Sequence Number 0.
Thus far we have worked with only source late join settings, but suppose that you want to receive only the last 10 messages. To do this, configure the receiver option retransmit_request_maximum (receiver) to set how many messages to request backwards from the latest message.
Follow the steps below to set this option to 10.
Add the following line to cfg2.cfg and rename it cfg3.cfg:
receiver retransmitrequestmaximumreceiver 10
Run:
lbmsrc -c cfg3.cfg -P 1000 topicName
lbmrcv -c cfg3.cfg -v topicName
. The output for each should closely resemble the following. LBMSRC
$ lbmsrc -c cfg3.cfg -P 1000 topicName Sending 10000000 messages of size 25 bytes to topic [topicName] Receiver connect [TCP:10.29.3.76:34448]
LBMRCV
$ lbmrcv -c cfg3.cfg -v topicName Immediate messaging target: TCP:10.29.3.76:4391 [topicName][TCP:10.29.3.77:4371][13]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][14]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][15]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][16]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][17]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][18]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][19]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][20]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][21]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][22]-RX-, 25 bytes [topicName][TCP:10.29.3.77:4371][23]-RX-, 25 bytes 1.002 secs. 0.01097 Kmsgs/sec. 2.195 Kbps [topicName][TCP:10.29.3.77:4371][24], 25 bytes 1.002 secs. 0.0009984 Kmsgs/sec. 0.1997 Kbps [topicName][TCP:10.29.3.77:4371][25], 25 bytes 1.002 secs. 0.0009984 Kmsgs/sec. 0.1997 Kbps [topicName][TCP:10.29.3.77:4371][26], 25 bytes ...
Note that 11, not 10, retransmits were actually received. This can happen because network and timing circumstances may have one RX already in transit while the specific RX amount is being processed. (Hence, it is not possible to guarantee one and only one RX message for every possible Late Join recovery.)
Suppose you have a persistent receiver that comes up at midday and must gracefully catch up on the large number of messages it has missed. The following discussion explains the relevant Late Join options and how to use them. (The discussion also applies to streaming-based late join, but since streaming sources must hold all retained messages in memory, there are typically far fewer messages available.)
Option: retransmit_request_outstanding_maximum (receiver)
When a receiver comes up and begins requesting Late Join messages, it does not simply request messages starting at Sequence Number 0 through 1000000. Rather, it requests the messages a little at a time, depending upon how option retransmit_request_outstanding_maximum (receiver) is set. For example, when set to the default of 10, the receiver sends requests the first 10 messages (Sequence Number 0 - 9). Upon receiving Sequence Number 0, it then requests the next message (10), and so on, limiting the number of outstanding unfulfilled requests to 10.
Note that higher for values retransmit_request_outstanding_maximum can increase the rate of RXs received, which can reduce the time required for receiver recovery. However, this can lead to heavy loading of the Store, potentially making it unable to sustain the incoming data rate.
Also, be aware that increasing retransmit_request_outstanding_maximum may require a corresponding increase to retransmit_request_interval (receiver). Otherwise you can have a situation where messages time out because it takes a Store longer than retransmit_request_interval to process all retransmit_request_outstanding_maximum requests. When this happens, you can see messages needlessly requested and sent many times (generates warnings to the receiver application log file).
Option: retransmit_message_caching_proximity (receiver)
When sequence number delivery order is used, long recoveries of active sources can create receiver memory cache problems due to the processing of both new and retransmitted messages. This option provides a method to control caching and cache size during recovery.
It does this by comparing the option value (default 2147483647) to the difference between the newest (live) received sequence number and the latest received RX sequence number. If the difference is less than the option's value, the receiver caches incoming live new messages. Otherwise, new messages are dropped and not cached (with the assumption that they can be requested later as retransmissions).
For example, as shown in the diagram below, a receiver may be receiving both live streaming messages (latest, #200) and catch-up retransmissions (latest, #100). The difference here is 100. If retransmit_message_caching_proximity (receiver) is 75, the receiver caches the live messages and will deliver them when it is all caught up with the retransmissions. However, if this option is 150, streamed messages are dropped and later picked up again as a retransmission.
The default value of this option is high enough to still encourage caching most of the time, and should be optimal for most receivers.
If your source streams faster than it retransmits, caching is beneficial, as it ensures new data are received only once, thus reducing recovery time. If the source retransmits faster than it streams, which is the optimal condition, you can lower the value of this option to use less memory during recovery, with little performance impact.
Off-Transport Recovery (OTR) is a lost-message-recovery feature that provides a level of hedging against the possibility of brief and incidental unrecoverable loss at the transport level or from a DRO. This section describes the OTR feature.
When a transport cannot recover lost messages, OTR engages and looks to the source for message recovery. It does this by accessing the source's retention buffer (used also by the Late Join feature) to re-request messages that no longer exist in a transport's transmission window, or other places such as a persistent Store or redundant source.
OTR functions in a manner very similar to that of Late Join, but differs mainly in that it activates in message loss situations rather than following the creation of a receiver, and shares only the late_join (source) option setting.
Upon detecting loss, a receiver initiates OTR by sending repeated, spaced, OTR requests to the source, until it recovers lost messages or a timeout period elapses.
OTR operates independently from transport-level recovery mechanisms such as NAKs for LBT-RU or LBT-RM. When you enable OTR for a receiver with use_otr (receiver), the otr_request_initial_delay (receiver) period starts as soon as the Delivery Controller detects a sequence gap. If the gap is not resolved by the end of the delay interval, OTR recovery initiates. OTR recovery can occur before, during or after transport-level recovery attempts.
When a receiver initiates OTR, the intervals between OTR requests increases twofold after each request, until the maximum interval is reached (assuming the receiver is still waiting to receive the retransmission). You use configuration options otr_request_minimum_interval (receiver) and otr_request_maximum_interval (receiver) to set the initial (minimum) and maximum intervals, respectively.
The source retransmits lost messages to the recovered receiver via unicast.
When sequence number delivery order is used and a gap of missing messages occurs, a receiver buffers the new incoming messages while it attempts to recover the earlier missing ones. Long recoveries of actively streaming sources can cause excessive receiver cache memory growth due to the processing of both new and retransmitted messages. You can control caching and cache size during recovery with options otr_message_caching_threshold (receiver) and retransmit_message_caching_proximity (receiver).
The option otr_message_caching_threshold (receiver) sets the maximum number of messages a receiver can buffer. When the number of cached messages hits this threshold, new streamed messages are dropped and not cached, with the assumption that they can be requested later as retransmissions.
The retransmit_message_caching_proximity (receiver), which is also used by Late Join (see retransmit_message_caching_proximity (receiver)), turns off this caching if there are too many messages to buffer between the last delivered message and the currently streaming messages.
Both of these option thresholds must be satisfied before caching resumes.
With the UMP/UMQ products, you can implement OTR in conjunction with the persistent Store, however in this configuration, it functions somewhat differently from Streaming. If an OTR-enabled receiver registered with a Store detects a sequence gap in the live stream and that gap is not resolved by other means within the next otr_request_initial_delay (receiver) period, the receiver requests those messages from the Store(s). If the Store does not have some of the requested messages, the receiver requests them from the source. Regardless of whether the messages are recovered from a Store or from the source, OTR delivers all recovered messages with the LBM_MSG_OTR flag, unlike Late Join, which uses the LBM_MSG_RETRANSMIT flag.
This section introduces the use of Transport Layer Security (TLS), sometimes known by its older designation Secure Sockets Layer (SSL).
The goal of the Ultra Messaging (UM) TLS feature is to provide encrypted transport of application data. TLS supports authentication (through certificates), data confidentiality (through encryption), and data integrity (ensuring data are not changed, removed, or added-to). UM can be configured to apply TLS security measures to all Streaming and/or Persisted TCP communication, including DRO peer links. Non-TCP communication is not encrypted (e.g. topic resolution).
TLS is a family of standard protocols and algorithms for securing TCP communication between a client and a server. It is sometimes referred as "SSL", which technically is the name of an older (less secure) version of the protocol. Over the years, security researchers (and hackers) have discovered flaws in SSL/TLS. However, the vast majority of the widely publicized security vulnerabilities have been flaws in the implementations of TLS, not in the recent TLS protocols or algorithms themselves. As of UM version 6.9, there are no known security weaknesses in TLS version 1.2, the version used by UM.
TLS is generally implemented by several different software packages. UM makes use of OpenSSL, a widely deployed and actively maintained open-source project.
TLS authentication uses X.509 digital certificates. Certificate creation and management is the responsibility of the user. Ultra Messaging's usage of OpenSSL expects PEM encoded certificates. There are a variety of generally available tools for converting certificates between different encodings. Since user infrastructures vary widely, the UM package does not include tools for creation, formatting, or management of certificates.
Although UM is designed as a peer-to-peer messaging system, TLS has the concept of client and server. The client initiates the TCP connection and the server accepts it. In the case of a TCP source, the receiver initiates and is therefore the client, with the source (sender of data) being the server. However, with unicast immediate messages, the sender of data is the client, and the recipient is the server. Due to the fact that unicast immediate messages are used by UM for internal control and coordination, it is typically not possible to constrain a given application to only operate as a pure client or pure server. For this reason, UM requires all applications participating in encryption to have a certificate. Server-only authentication (i.e. anonymous client, as is used by web browsers) is not supported. It is permissible for groups of processes, or even all processes, to share the same certificate.
A detailed discussion of certificate usage is beyond the scope of the Ultra Messaging documentation. However, you can find a step-by-step procedure for creating a self-signed X.509 security certificate here: https://kb.informatica.com/howto/6/Pages/18/432752.aspx
The TLS protocol was designed to allow for a high degree of backwards compatibility. During the connection establishment phase, the client and server perform a negotiation handshake in which they identify the highest common versions of various security options. For example, an old web browser might pre-date the introduction of TLS and only support the older SSL protocol. OpenSSL is often configured to allow clients and servers to "negotiate down" to those older, less-secure protocols or algorithms.
Ultra Messaging has the advantage of not needing to communicate with old versions of SSL or TLS. UM's default configuration directs OpenSSL to require both the client and the server to use protocols and algorithms which were highly regarded, as of UM's release date. If vulnerabilities are discovered in the future, the user can override UM's defaults and chose other protocols or algorithms.
When a TLS connection is initiated, a handshake takes place prior to application data encryption. Once the handshake is completed, the CPU effort required to encrypt and decrypt application data is minimal. However, the handshake phase involves the use of much less efficient algorithms.
There are two factors under the user's control, which greatly affect the handshake efficiency: the choice of cipher suite and the key length. We have seen an RSA key of 8192 bits take 4 seconds of CPU time on a 1.3GHz SparcV9 processor just to complete the handshake for a single TLS connection.
Users should make their choices with an understanding of the threat profiles they are protecting against. For example, it is estimated that a 1024-bit RSA key can be broken in about a year by brute force using specialized hardware (see http://www.tau.ac.il/~tromer/papers/cbtwirl.pdf). This may be beyond the means of the average hacker, but well within the means of a large government. RSA keys of 2048 bits are generally considered secure for the foreseeable future.
TLS is enabled on a context basis. When enabled, all Streaming and Persistence related TCP-based communication into or out of the context is encrypted by TLS. A context with TLS enabled will not accept source creation with transports other than TCP.
Subscribers will only successfully receive data if the receiver's context and the source's context share the same encryption settings. A receiver created in an encrypted enabled context will ignore topic resolution source advertisements for non-encrypted sources, and will therefore not subscribe. Similarly, a receiver created in a non-encrypted context will ignore topic resolution source advertisements for encrypted sources. Topic resolution queries are also ignored by mismatched contexts. No warning will be logged when these topic resolution datagrams are ignored, but each time this happens, the context-level statistic tr_dgrams_dropped_type is incremented.
TLS is applied to unicast immediate messages as well, as invoked either directly by the user, or internally by functions like late join, request/response, and Persistence-related communication between sources, receivers, and Stores.
Brokered Queuing using AMQP does not use the UM TLS feature. A UM brokered context does not allow TLS to be enabled.
The tls_cipher_suites (context) configuration option defines the list of one or more (comma separated) cipher suites that are acceptable to this context. If more than one is supplied, they should be in descending order of preference. When a remote context negotiates encrypted TCP, the two sides must find a cipher suite in common, otherwise the connection will be canceled.
OpenSSL uses the cipher suite to define the algorithms and key lengths for encrypting the data stream. The choice of cipher suite is critical for ensuring the security of the connection. To achieve a high degree of backwards compatibility, OpenSSL supports old cipher suites which are no longer considered secure. The user is advised to use UM's default suite.
OpenSSL follows its own naming convention for cipher suites. See OpenSSL's Cipher Suite Names for the full list of suite names. When configuring UM, use the OpenSSL names (with dashes), not* the IANA names (with underscores).
TLS is designed to encrypt a TCP connection, and works with TCP-based persisted data Transport Sessions and control traffic. However, TLS is not intended to encrypt data at rest. When a persistent Store is used with the UM TLS feature, the user messages are written to disk in plaintext form, not encrypted.
The UM TLS feature does not apply to the AMQP connection to the brokered queue. UM does not currently support security on the AMQP connection.
However, the ULB form of queuing does not use a broker. For ULB sources that are configured for TCP, the UM TLS feature will encrypt the application data.
When a DRO is used to route messages across Topic Resolution Domains (TRDs), be aware that the TLS session is terminated at the DRO's proxy receiver/source. Because each endpoint portal on a DRO is implemented with its own context, care must be taken to ensure end-to-end security. It is possible to have a TLS source publishing in one TRD, received by a DRO (via an endpoint portal also configured for TLS), and re-published to a different TRD via an endpoint portal configured with a non-encrypted context. This would allow a non-encrypted receiver to access messages that the source intended to be encrypted. As a message is forwarded through a DRO network, it does not propagate the security settings of the originator, so each portal needs to be appropriately encrypted. The user is strongly encouraged to configure ALL portals on an interconnected network of DROs with the same encryption settings.
The encryption feature is extended to DRO peer links, however peer links are not context-based and are not configured the same way. The following XML elements are used by the DRO to configure a peer link:
As with sources and receivers, the portals on both sides of a peer link must be configured for compatible encryption settings.
Notice that there is no DRO element corresponding to the context option tls_compression_negotiation_timeout (context). The DRO peer link's negotiation timeout is hard-coded to 5 seconds.
See DRO Configuration DTD for details.
Many users have advanced network equipment (switches/routers), which transparently compress packets as they traverse the network. This compression is especially valued to conserve bandwidth over long-haul WAN links. However, when packets are encrypted, the network compressors are typically not able to reduce the size of the data. If the user desires UM messages to be compressed and encrypted, the data needs to be compressed before it is encrypted.
The UM compression feature (see Compressed TCP) accomplishes this. When both TLS and compression are enabled, the compression is applied to user data first, then encryption.
Be aware that there can be information leakage when compression is applied and an attacker is able to inject data of known content over a compressed and encrypted session. For example, this leakage is exploited by the CRIME attack, albeit primarily for web browsers. Users must weigh the benefits of compression against the potential risk of information leakage.
Version Interoperability
It is not recommended to mix pre-6.9 contexts with encrypted contexts on topics of shared interest. If a process with a pre-6.9 version of UM creates a receiver, and another process with UM 6.9 or beyond creates a TLS source, the pre-6.9 receiver will attempt to join the TLS source. After a timeout, the handshake will fail and the source will disconnect. The pre-6.9 receiver will retry the connection, leading to flapping.
Note that in the reverse situation, a 6.9 TLS receiver will simply ignore a pre-6.9 source. I.e. no attempt will be made to join, and no flapping will occur.
For UM versions 6.9 through 6.12, the UM dynamic library was linked with OpenSSL in such a way as to require its presence in order for UM to load and initialize. I.e. it was a load-time dependency.
As of UM version 6.12.1, the linkage with OpenSSL is made at run-time. If encryption features are not used, the OpenSSL libraries do not need to be present on the system. UM is able to initialize without OpenSSL.
There are two UM features which utilize encryption provided by OpenSSL:
This section introduces the use of Compression with TCP connections.
The goal of the Ultra Messaging (UM) compression feature is to decrease the size of transmitted application data. UM can be configured to apply compression to all Streaming and/or Persisted TCP communication.
Non-TCP communication is not compressed (e.g. topic resolution).
Compression is generally implemented by any of several different software packages. UM makes use of LZ4, a widely deployed open-source project.
While the UM compression feature is usable for TCP-based sources and receivers, it is possibly most useful when applied to DRO peer links.
Compression is enabled on a context basis. When enabled, all Streaming and Persistence related TCP-based communication into or out of the context is compressed by LZ4. A context with compression enabled will not accept source creation with transports other than TCP.
Subscribers will only successfully receive data if the receiver's context and the source's context share the same compression settings. A receiver created in a compression-enabled context will ignore topic resolution source advertisements for non-compressed sources, and will therefore not subscribe. Similarly, a receiver created in an non-compressed context will ignore topic resolution source advertisements for compressed sources. Topic resolution queries are also ignored by mismatched contexts. No warning will be logged when these topic resolution datagrams are ignored, but each time this happens, the context-level statistic tr_dgrams_dropped_type is incremented.
Compression is applied to unicast immediate messages as well, as invoked either directly by the user, or internally by functions like late join, request/response, and Persistence-related communication between sources, receivers, and Stores.
Brokered Queuing using AMQP does not use the UM compression feature. A UM brokered context does not allow compression to be enabled.
The compression-related configuration options used by the Ultra Messaging library are:
Compression is designed to compress a data Transport Session. It is not intended to compress data at rest. When a persistent Store is used with the UM compression feature, the user messages are written to disk in uncompressed form.
The UM compression feature does not apply to the AMQP connection to the brokered queue. UM does not currently support compression on the AMQP connection.
However, the ULB form of queuing does not use a broker. For ULB sources that are configured for TCP, the UM compression feature will compress the application data.
When a DRO is used to route messages across Topic Resolution Domains (TRDs), be aware that the compression session is terminated at the DRO's proxy receiver/source. Because each endpoint portal on a DRO is implemented with its own context, care must be taken to ensure end-to-end compression (if desired). As a message is forwarded through a DRO network, it does not propagate the compression setting of the originator, so each portal needs to be appropriately compressed.
Possibly the most-useful application of the UM compression feature is not TCP sources, but rather DRO peer links. The compression feature is extended to DRO peer links, however peer links are not context-based and are not configured the same way. The following XML elements are used by the DRO to configure a peer link:
As with sources and receivers, the portals on both sides of a peer link must be configured for the same compression setting.
Notice that there is no DRO element corresponding to the context option tls_compression_negotiation_timeout (context). The DRO peer link's negotiation timeout is hard-coded to 5 seconds.
See DRO Configuration DTD for details.
See TLS and Compression.
It is not recommended to mix pre-6.9 contexts with compressed contexts on topics of shared interest. As mentioned above, if a compressed and an uncompressed context connect via TCP, the connection will fail and retry, resulting in flapping.
This section introduces the use of high-resolution timestamps with LBT-RM.
The Ultra Messaging (UM) high-resolution message timestamp feature leverages the hardware timestamping function of certain Solarflare network interface cards (NICs) to measure sub-microsecond times that packets are transmitted and received. Solarflare's NICs and Onload kernel-bypass driver implement PTP to synchronize timestamps across the network, allowing very accurate one-way latency measurements. The UM timestamp feature requires Onload version 201509 or later.
For subscribers, each message's receive timestamp is delivered in the message's header structure (for C programs, lbm_msg_t field hr_timestamp, of type lbm_timespec_t). Each timestamp is a structure of 32 bits worth of seconds and 32 bits worth of nanoseconds. When both values are zero, the timestamp is not available.
For publishers, each message's transmit timestamp is delivered via the source event callback (for C programs, event type LBM_SRC_EVENT_TIMESTAMP). The same timestamp structure as above is delivered with the event, as well as the message's sequence number. Sending applications can be informed of the outgoing sequence number range of each message by using the extended form of the send function and supplying the LBM_SRC_SEND_EX_FLAG_SEQUENCE_NUMBER_INFO flag. This causes the LBM_SRC_EVENT_SEQUENCE_NUMBER_INFO event to be delivered to the source event handler.
Due to the specialized nature of this feature, there are several restrictions in its use.
Unicast Immediate Messaging (UIM) deviates from the normal publish/subscribe paradigm by allowing the publisher to send messages to a specific destination application context. Various features within UM make use of UIMs transparently to the application. For example, a persistent receiver sends consumption acknowledgements to the Store using UIM messages.
The application can make direct use of UIM in two ways:
A UIM message can be associated with a topic string, but that topic is not used to determine where the message is sent to. Instead, the topic string is included in the message header, and the application must specify the destination of the desired context in the UIM send call.
Alternatively, an application can send a UIM message without a topic string. This so-called "topicless" message is not delivered to the receiving application via the normal receiver callback. Instead it is delivered by an optional context callback. See Receiving a UIM.
UIM messages are sent using the TCP protocol; no other protocol is supported for UIM. The TCP connection is created dynamically as-needed, when a message is sent. That is, when an application sends its first UIM message to a particular destination context, the sender's context holds the message and initiates the TCP connection. When the connection is accepted by the destination context, the sender sends the message. When the message is fully sent, the sender will keep the TCP connection open for a period of time in case more UIMs are sent to the same destination context. See UIM Connection Management.
Although UIM is implemented using TCP, it should not be considered "reliable" in the same way that UM sources are. There are a variety of ways that a UIM message can fail to be delivered to the destination. For example, if an overloaded DRO must forward the message to the destination TRD, the DRO might have to drop the message.
Note that the success or failure of delivery of a UIM cannot be determined by the sender. For example, calling lbm_unicast_immediate_message() or lbm_send_response() might return a successful status, but the message might have been queued internally for later transmission. So the success status only means that it was queued successfully. A subsequent failure may or may not trigger an error log; it depends on what the nature of the failure is.
As a result, applications are responsible for detecting failures (typically using timeouts) and implementing retry logic. If a higher degree of messaging reliability is required, normal UM sources should be used.
(For UIM messages that are sent transparently by UM features, various timeout/retry mechanisms are implemented internally.)
UIMs can also be lost as a result of TCP disconnections. See TCP Disconnections for more information.
There are three ways to specify the destination address of a UIM:
In the Explicit addressing method, the "ip" and "port" refer to the binding of the destination context's UIM port (also called "request port"). By default, when a context is created, UM will select values from a range of possibilities for ip and port. However, this makes it difficult for a sender to construct an explicit address since the ip and port are not deterministic.
One solution is to explicitly set the ip and port for the context's UIM port using the configuration options: request_tcp_port (context) and request_tcp_interface (context).
There are two kinds of UIM messages:
To receive UIM messages with a topic, an application simply creates a normal receiver for that topic. Alternatively, it can create a wildcard receiver for a matching topic pattern. Finally, the application can also register a callback specifically for UIM messages that contain a topic but for which no matching receiver exists, using immediate_message_topic_receiver_function (context). Alternatively, lbm_context_rcv_immediate_topic_msgs() can be used.
To receive UIM messages with no topic (topicless), the application must register a callback with the context for topicless messages, using immediate_message_receiver_function (context)). Alternatively, the API lbm_context_rcv_immediate_msgs() can be used to register that callback.
Note that only the specified destination context will deliver the message. If other applications have a receiver for that same topic, they will not receive a copy of that UIM message.
UIM Port
To receive UIMs, a context must bind to and listen on the "UIM Port", also known as the "Request Port". See UIM Addressing for details.
The following APIs are used to send application UIM messages:
For the lbm_unicast_immediate_message() and lbm_unicast_immediate_request() APIs, the user has a choice between sending messages with a topic or without a topic (topicless). With the C API, passing a NULL pointer for the topic string sends a topicless message.
The act of sending a UIM message will check to see if the context already has a TCP connection open to the destination context. If so, the existing connection is used to send the UIM. Otherwise, UM will initiate a new TCP connection to the destination context.
Once the message is sent, an activity deletion timer is started for the connection; see response_tcp_deletion_timeout (context). If another UIM message is sent to the same destination, the activity deletion timer is canceled and restarted. Thus, if messages are sent periodically with a shorter period than the activity deletion timer, the TCP connection will remain established.
However, if no messages are sent for more time than the activity deletion timer, the timer will expire and the TCP connection will be deleted and resources cleaned up.
An exception to this workflow exists for the Request/Response feature. When a request message is received by a context, the context automatically initiates a connection to the requester, even though the application has not yet sent its response UIM. The activity deletion timer is not started at this time. When the application's receiver callback is invoked with the request message, the message contains a reference to a response object. This response object is used for sending response UIMs back to the requester. However, the act of sending these responses also does not start the activity deletion timer for the TCP connection. The activity deletion timer is only started when the response object is deleted (usually implicitly when the message itself is deleted, usually as a result of returning from the receiver callback).
Note that the application that receives a request has the option of retaining the message, which delays deletion of the message until the application explicitly decides to delete it. In this case, the TCP connection is held open for as long as the application retains the response object. When the application has finished sending any and all of its responses to the request and does not need the request object any more, it deletes the request object. This starts the activity deletion timer running.
Finally, note that there is a queue in front of the UIM connection which holds messages when the connection is slow. It is possible that messages are still held in the queue for transmission after the response object is deleted, and if the response message is very large and/or the destination application is slow processing responses, it is possible for data to still be queued when the activity deletion timer expires. In that case, UM does not delete the TCP connection, and instead restarts the activity deletion timer for the connection.
MIM uses the same reliable multicast protocol as normal LBT-RM sources. MIM messages can be sent to a topic, in which case each receiving context will filter that message, discarding it if no receiver exists for that topic. MIM avoids using Topic Resolution by including the full topic string in the message, and sending it to a multicast group that all application contexts are configured to use.
A receiving context will receive the message and check to see if the application created a receiver for the topic. If so, then the message is delivered. However, if no receiver exists for that topic, the context checks to see if immediate_message_topic_receiver_function (context) is configured. If so, then the message is delivered. But if neither exists, then the receiving context discards the message.
It is also possible to send a "topicless" message via MIM. The recipient context must have configured a topicless receiver using immediate_message_receiver_function (context); otherwise the message is discarded.
A valid use case for MIM might be an application that starts running, sends a request message, gets back a response, and then exits. With this kind of short-lived application, it can be a burden to create a source and wait for it to resolve. With MIM, topic resolution is skipped, so no delay is needed prior to sending.
MIM is typically not used for normal Streaming data because messages are somewhat less efficiently handled than source-based messages. Inefficiencies derive from larger message sizes due to the inclusion of the topic name, and on the receiving side, the MIM Delivery Controller hashing of topic names to find receivers, which consumes some extra CPU. If you have a high-message-rate stream, you should use a source-based method and not MIM. If head-loss is a concern and delay before sending is not feasible, then consider using late join (although this replaces head-loss with some head latency).
Multicast Immediate Messaging can benefit from hardware acceleration. See Transport Acceleration Options for more information
MIM uses the same reliable multicast algorithms as LBT-RM. When a publisher sends a message with lbm_multicast_immediate_message(), MIM creates a temporary Transport Session. Note that no topic-level source object is created.
MIM automatically deletes the temporary Transport Session after a period of inactivity defined by mim_src_deletion_timeout (context) which defaults to 30 seconds. A subsequent send creates a new Transport Session. Due to the possibility of head-loss in the switch, it is recommended that publisher use a long deletion timeout if they continue to use MIM after significant periods of inactivity.
MIM forces all topics across all publishers to be concentrated onto a single multicast address to which ALL applications listen, even if they aren't interested in any of the topics. Thus, all topic filtering must happen in UM.
MIM can also be used to send an UM request message with lbm_multicast_immediate_request(). For example, an application can use MIM to request initialization information right when it starts up. MIM sends the response directly to the initializing application, avoiding the topic resolution delay inherent in the normal source-based lbm_send_request() function.
MIM notifications differ in the following ways from normal UM source-based sending.
To receive MIM messages with a topic, an application simply creates a normal receiver for that topic. Alternatively, it can create a wildcard receiver for a matching topic pattern. Finally, the application can also register a callback specifically for UIM messages that contain a topic but for which no matching receiver exists, using immediate_message_topic_receiver_function (context). Alternatively, lbm_context_rcv_immediate_topic_msgs() can be used.
To receive MIM messages with no topic (topicless), the application must register a callback for topicless messages, using immediate_message_receiver_function (context)). Alternatively, lbm_context_rcv_immediate_msgs() can be used.
If needed, an application can send topicless messages using MIM. A MIM sender passes in a NULL string instead of a topic name. The message goes out on the MIM multicast address and is received by all other receivers. A receiving application can use lbm_context_rcv_immediate_msgs() to set the callback procedure and delivery method for topicless immediate messages.
When an application receives an immediate message, it's topic is hashed to see if there is at least one regular (non-wildcard) receiver object listening to the topic. If so, then MIM delivers the message data to the list of receivers.
However, if there are no regular receivers for that topic in the receive hash, MIM runs the message topic through all existing wildcard patterns and delivers matches to the appropriate wildcard receiver objects without creating sub-receivers. The next MIM message received for the same topic will again be run through all existing wildcard patterns. This can consume significant CPU resources since it is done on a per-message basis.
The receiving application can set up a context callback to be notified of MIM unrecoverable loss (lbm_mim_unrecloss_function_cb()). It is not possible to do this notification on a topic basis because the receiving UM has no way of knowing which topics were affected by the loss.
The configuration options for MIM are described here:
UM includes two example applications that illustrate MIM.
Example lbmimsg.c - application that sends immediate messages as fast as it can to a given topic (single source). See also the Java example, Example lbmimsg.java, and the .NET example, Example lbmimsg.cs.
For information regarding HyperTopics, see C HyperTopics Details.
The Application Headers feature is an older method of adding untyped, unstructured metadata to messages.
Send Message with Application Headers
To send a message with one or more application headers attached, follow these steps:
Create an application header chain object using lbm_apphdr_chain_create().
Declare a chain element structure lbm_apphdr_chain_elem_t.
Set lbm_apphdr_chain_elem_t_stct::len to the number of bytes of data.
Declare an extended send information structure lbm_src_send_ex_info_t.
Set the lbm_src_send_ex_info_t_stct::apphdr_chain field with the application header chain object created in step 1.
Call lbm_src_send_ex() passing the extended send information structure.
Delete the application header chain object using lbm_apphdr_chain_delete().
Instead of creating and deleting the application header chain with each message, it may be retained and re-used. However, it may not be modified.
Receive Message with Application Headers
To handle a received message that may contain application headers, follow these steps:
In the receiver callback, create an application header iterator using lbm_apphdr_chain_iter_create_from_msg(). If it returns LBM_FAILURE then the message has no application headers.
If lbm_apphdr_chain_iter_create_from_msg() returns LBM_OK, declare a chain element structure lbm_apphdr_chain_elem_t pointer and set it using lbm_apphdr_chain_iter_current().
Access the chain element data through the chain element structure using lbm_apphdr_chain_elem_t_stct::data.
To step to the next application header (if any), call lbm_apphdr_chain_iter_next(). This function returns LBM_OK if there is another application header; go to step 3.
Delete the iterator using lbm_apphdr_chain_iter_delete().
Application Headers are not compatible with the following UM features:
Note that the user data provided for the application header is not typed. UM cannot reformat messages for architectural differences in a heterogeneous environment. For example, UM does not do byte-swapping between big and little endian CPUs.
The Message Properties feature allows your application to add typed metadata to messages as name/value pairs. UM allows eight property types: boolean, byte, short, int, long, float, double, and string. See Message Properties Data Types.
With the UMQ product, the UM message property object supports the standard JMS message properties specification.
Message properties are not compatible with the following UM features:
Send Message with Message Properties
For sending messages with message properties using Smart Sources, see Smart Sources and Message Properties.
To send a message with one or more message properties attached using normal sources (i.e. not Smart Sources), follow these steps:
Create a message properties object using lbm_msg_properties_create().
Add one or more properties using lbm_msg_properties_set().
Declare an extended send information structure lbm_src_send_ex_info_t.
Set the lbm_src_send_ex_info_t_stct::properties field with the properties object created in step 1.
Call lbm_src_send_ex() passing the extended send information structure.
Delete the message properties object using lbm_msg_properties_delete().
Instead of creating and deleting the properties object with each message, it may be retained and re-used. It can also be modified using lbm_msg_properties_clear() and lbm_msg_properties_set().
Receive Message with Message Properties
To handle a received message that may contain message properties, follow these steps:
In the receiver callback, check the message's properties field lbm_msg_t_stct::properties. If it is NULL, the message has no properties.
If lbm_msg_t_stct::properties is non-null, create a property iterator object using lbm_msg_properties_iter_create().
Set the iterator to the first property in the message using lbm_msg_properties_iter_first().
Access that property's value through the iterator using lbm_msg_properties_iter_t_stct::data (must be cast to the appropriate data type).
To step to the next property (if any), call lbm_msg_properties_iter_next(). This function returns LBM_OK if there is another property; go to step 4. If there are no more properties, it returns LBM_FAILURE.
Delete the iterator using lbm_msg_properties_iter_delete().
Instead of iterating through the properties, it is also possible to access properties by name using lbm_msg_properties_get(). However, this can be less efficient.
For a C example on how to iterate received message properties, see Example lbmrcv.c.
Note that if a publisher sends a message with no properties set, best practice is to not pass the property object to the send function. It is permissible to pass a property object with no properties set, but it adds latency and overhead. Also, it causes the receiver to get a non-null msg->properties field. When the receiver calls lbm_msg_properties_iter_first(), it will return an error because there isn't a "first" property. It is better for the publisher to not pass an empty properties object so that the receiver will get a NULL msg->properties field.
Due to differences in the integer variable sizes on different compilers on different platforms, Informatica recommends using the following C data types for the corresponding message property data types:
Property Type | C Type |
---|---|
LBM_MSG_PROPERTY_BOOLEAN | char |
LBM_MSG_PROPERTY_BYTE | char |
LBM_MSG_PROPERTY_SHORT | lbm_uint16_t |
LBM_MSG_PROPERTY_INT | lbm_int32_t |
LBM_MSG_PROPERTY_LONG | lbm_int64_t |
LBM_MSG_PROPERTY_FLOAT | float |
LBM_MSG_PROPERTY_DOUBLE | double |
LBM_MSG_PROPERTY_STRING | char array |
Ultra Messaging sends property names on the wire with every message. To reduce bandwidth requirements, minimize the length and number of properties. When coding sources, consider the following sequence of guidelines:
When coding receivers in Java or .NET, call Dispose() on messages before returning from the application callback. This allows Ultra Messaging to internally recycle objects, and limits object allocation.
Smart Sources support a limited form of message properties. Only 32-bit integer property types are allowed with Smart Sources. Also, property names are limited to 7 ASCII characters. Finally, the normal message properties object lbm_msg_properties_t and its APIs are not used on the sending side. Rather a streamlined method of specifying message properties for sending is used.
As with most of Smart Source's internal design, the message header for message properties must be pre-allocated with the maximum number of desired message properties. This is done at creation time for the Smart Source using the configuration option smart_src_message_property_int_count (source).
Sending messages with message properties must be done using the lbm_ssrc_send_ex() API, passing it the desired properties with the lbm_ssrc_send_ex_info_t structure. The first call to send with message properties will serialize the supplied properties and encode them into the pre-allocated message header.
Subsequent calls to send with message properties will ignore the passed-in properties and simply re-send the previously-serialized header.
If an application needs to change the message property values after that initial send, the "update" flag flag can be used, which will trigger modification of the property values. This "update" flag cannot be used to change the number of properties, or the key names of the properties.
If an application needs messages with different numbers of properties and/or different key names of properties, the most efficient way to accomplish this is with multiple message buffers. Each buffer should be associated with a desired set of properties. When a message needs to be sent, the proper buffer is selected for building the message. This avoid the overhead of serializing the properties with each send call.
However, if the application requires dynamic construction of properties, a single buffer can be used along with the "rebuild" flag to trigger a full serialization of the properties.
For a full example of message property usage with Smart Source, see Example lbmssrc.c or Example lbmssrc.java.
The first message with a message property sent to a Smart Source follows a specific sequence:
Create the topic object with the configuration option smart_src_message_property_int_count (source) set to the maximum number of properties desired on a message.
Create the Smart Source with lbm_ssrc_create().
Allocate one or more message buffers with lbm_ssrc_buff_get(). You might allocate one for messages that have properties, and another for messages that don't.
When preparing the first message with message properties to be sent, define the properties using a lbm_ssrc_send_ex_info_t structure:
Send the message using lbm_ssrc_send_ex() and the LBM_SSRC_SEND_EX_FLAG_PROPERTIES flag:
Since this is the first send with message properties, UM will serialize the properties and set up the message header. (It is not valid to set the LBM_SSRC_SEND_EX_FLAG_UPDATE_PROPERTY_VALUES flag on this first send with message properties.)
For subsequent sends, there are different use cases:
Send the message with the same properties and values. You can re-use the same message buffer and lbm_ssrc_send_ex_info_t structure:
Send with message properties after having made changes to the property values (but not the keys or the number of properties) by setting the LBM_SSRC_SEND_EX_FLAG_UPDATE_PROPERTY_VALUES flag:
Send a message with either a different number of properties, and/or different key names by setting the LBM_SSRC_SEND_EX_FLAG_REBUILD_BUFFER flag:
To be more efficient, instead of using the LBM_SSRC_SEND_EX_FLAG_REBUILD_BUFFER flag, you can use different message buffers (allocated with lbm_ssrc_buff_get()) for each message property structure. This saves the time required to re-serialize the message properties each time you want to use a different property structure.
Request/response is a very common messaging model whereby a client sends a "request" message to a server and expects a response. The server processes the request and return a response message to the originating client.
The UM request/response feature simplifies implementation of this model in the following ways:
UM provides three ways to send a request message.
When the client application sends a request message, it references an application callback function for responses and a client data pointer for application state. The send call returns a "request object". As one or more responses are returned, the callback is invoked to deliver the response messages, associated with the request's client data pointer. The requesting application decides when its request is satisfied (perhaps by completeness of a response, or by timeout), and it calls lbm_request_delete() to delete the request object. Even if the server chooses to send additional responses, they will not be delivered to the requesting application after it has deleted the corresponding request object.
The server application receives a request via the normal message receive mechanism, but the message is identified as type "request". Contained within that request message's header is a response object, which serves as a return address to the requester. The server application responds to an UM request message by calling lbm_send_response(). The response message is sent unicast via a dynamic TCP connection managed by UM.
A UM response message can be of any arbitrary size. However, given that responses are sent as UIMs, and UIMs are not guaranteed, UM responses can be lost. This is especially true if DROs are being used. Larger UM responses have a higher probability of being lost than smaller UM responses. In particular, UM responses larger than 65,000 bytes can be lost even if network packets are simply delivered out of order, as might happen if the DROs are configured to use a UDP Peer Link.
This does not mean that you must avoid sending large response messages. Rather it emphasizes the importance of using an application-level timeout/retry algorithm for request/response. In practice, even multi-megabyte responses would be lost only rarely, but it can happen.
Since the response object is part of the message header, it is normally deleted at the same time that the message is deleted, which typically happens automatically when the receiver callback returns. However, there are times when the application needs the scope of the response object to extend beyond the execution of the receiver callback. One method of extending the lifetime of the response object is to "retain" the request message, using lbm_msg_retain().
However, there are times when the size of the request message makes retention of the entire message undesirable. In those cases, the response object itself can be extracted and retained separately by saving a copy of the response object pointer and setting the message header's response pointer to NULL (to prevent UM from deleting the response object when the message is deleted).
There are even occasions when an application needs to transfer the responsibility of responding to a request message to a different process entirely. I.e. the server which receives the request is not itself able to respond, and needs to send a message (not necessarily the original request message) to a different server. In that case, the first server which receives the request must serialize the response object to type lbm_serialized_response_t by calling lbm_serialize_response(). It includes the serialized response object in the message forwarded to the second server. That server de-serializes the response object by calling lbm_deserialize_response(), allowing it to send a response message to the original requesting client.
UM creates and manages the special TCP connections for responses, maintaining a list of active response connections. When an application sends a response, UM scans that list for an active connection to the destination. If it doesn't find a connection for the response, it creates a new connection and adds it to the list. After the lbm_send_response() function returns, UM schedules the response_tcp_deletion_timeout (context), which defaults to 2 seconds. If a second request comes in from the same application before the timer expires, the responding application simply uses the existing connection and restarts the deletion timer.
It is conceivable that a very large response could take more than the response_tcp_deletion_timeout (context) default (2 seconds) to send to a slow-running receiver. In this case, UM automatically increases the deletion timer as needed to ensure the last message completes.
See the UM Configuration Guide for the descriptions of the Request/Response configuration options:
UM includes two example applications that illustrate Request/Response.
We can demonstrate a series of 5 requests and responses with the following procedure:
LBMREQ
Output for lbmreq should resemble the following:
$ lbmreq -R 5 -q topicname Event queue in use Using TCP port 4392 for responses Delaying requests for 1000 milliseconds Sending request 0 Starting event pump for 5 seconds. Receiver connect [TCP:10.29.1.78:4958] Done waiting for responses. 1 responses (25 bytes) received. Deleting request. Sending request 1 Starting event pump for 5 seconds. Done waiting for responses. 1 responses (25 bytes) received. Deleting request. Sending request 2 Starting event pump for 5 seconds. Done waiting for responses. 1 responses (25 bytes) received. Deleting request. Sending request 3 Starting event pump for 5 seconds. Done waiting for responses. 1 responses (25 bytes) received. Deleting request. Sending request 4 Starting event pump for 5 seconds. Done waiting for responses. 1 responses (25 bytes) received. Deleting request. Quitting...
LBMRESP
Output for lbmresp should resemble the following:
$ lbmresp -v topicname Request [topicname][TCP:10.29.1.78:14371][0], 25 bytes Sending response. 1 responses of 25 bytes each (25 total bytes). Done sending responses. Deleting response. Request [topicname][TCP:10.29.1.78:14371][1], 25 bytes Sending response. 1 responses of 25 bytes each (25 total bytes). Done sending responses. Deleting response. Request [topicname][TCP:10.29.1.78:14371][2], 25 bytes Sending response. 1 responses of 25 bytes each (25 total bytes). Done sending responses. Deleting response. Request [topicname][TCP:10.29.1.78:14371][3], 25 bytes Sending response. 1 responses of 25 bytes each (25 total bytes). Done sending responses. Deleting response. Request [topicname][TCP:10.29.1.78:14371][4], 25 bytes Sending response. 1 responses of 25 bytes each (25 total bytes). Done sending responses. Deleting response. [topicname][TCP:10.29.1.78:14371], End of Transport Session
The UM Self-Describing Messaging (SDM) feature provides an API that simplifies the creation and use of messages by your applications. An SDM message contains one or more fields and each field consists of the following:
Each named field may appear only once in a message. If multiple fields of the same name and type are needed, array fields are available. A field in a nested message may have the same name as a field in the outer message.
SDM is particularly helpful for creating messages sent across platforms by simplifying the creation of data formats. SDM automatically performs platform-specific data translations, eliminating endian conflicts.
Using SDM also simplifies message maintenance because the message format or structure can be independent of the source and receiver applications. For example, if your receivers query SDM messages for particular fields and ignore the order of the fields within the message, a source can change the field order if necessary with no modification of the receivers needed.
See the C, Java, and .NET API references for details.
Informatica generally recommends the use of Pre-Defined Messages, which is more efficient than self-describing messages.
The UM Pre-Defined Messages (PDM) feature provides an API similar to the SDM API, but allows you to define messages once and then use the definition to create messages that may contain self-describing data. Eliminating the need to repeatedly send a message definition increases the speed of PDM over SDM. The ability to use arrays created in a different programming language also improves performance.
The PDM library lets you create, serialize, and deserialize messages using pre-defined knowledge about the possible fields that may be used. You can create a definition that a) describes the fields to be sent and received in a message, b) creates the corresponding message, and c) adds field values to the message. This approach offers several performance advantages over SDM, as the definition is known in advance. However, the usage pattern is slightly different from the SDM library, where fields are added directly to a message without any type of definition.
A PDM message contains one or more fields and each field consists of the following:
Each named field may appear only once in a message. If multiple fields of the same name and type are needed, array fields are available. A field in a nested message may have the same name as a field in the outer message.
See the C, Java, and .NET API references for details.
The C API also has information and code samples about how to create definitions and messages, set field values in a message, set the value of array fields in a message, serialize, deserialize and dispose of messages, and fetch values from a message.
See C PDM Details.
The typical PDM usage patterns can usually be broken down into two categories: sources (which need to serialize a message for sending) and receivers (which need to deserialize a message to extract field values). However, for optimum performance for both sources and receivers, first set up the definition and a single instance of the message only once during a setup or initialization phase, as in the following example workflow:
PDM APIs are provided in C, Java, and C#, however, the examples in this section are Java based.
PDM Code Example, Source
Translating the Typical PDM Usage Patterns to Java for a source produces the following:
PDM Code Example, Receiver
Translating the Typical PDM Usage Patterns to Java for a receiver produces the following:
PDM Code Example Notes
In the examples above, the setupPDM() function is called once to set up the PDM definition and message. It is identical in both the source and receiver cases and simply sets up a definition that contains three required fields with integer names (100, 101, 102). Once finalized, it can create a message that leverages its pre-defined knowledge about these three required fields. The source example adds the three sample field values (a boolean, int32, and float) to the message, which is then serialized to a byte array. In the receiver example, the message parses a byte array into the message and then extracts the three field values.
The following code snippets expand upon the previous examples to demonstrate the usage of additional PDM functionality (but use "..." to eliminate redundant code).
Reusing the Message Object
Although the examples use a single message object (which provides performance benefits due to reduced message creation and garbage collection), it is not explicitly required to reuse a single instance. However, multiple threads should not access a single message instance.
Number of Fields
Although the number of fields above is initially set to 3 in the PDMDefinition constructor, if you add more fields to the definition with the addFieldInfo method, the definition grows to accommodate each field. Once the definition is finalized, you cannot add additional field information because the definition is now locked and ready for use in a message.
String Field Names
The examples above use integer field names in the setupPDM() function when creating the definition. You can also use string field names when setting up the definition. However, you still must use a FieldInfo object to set or get a field value from a message, regardless of field name type. Notice that false is passed to the PDMDefinition constructor to indicate string field names should be used. Also, the overloaded addFieldInfo function uses string field names (.Field100.) instead of the integer field names.
Retrieving FieldInfo from the Definition
At times, it may be easier to lookup the FieldInfo from the definition using the integer name (or string name if used). This eliminates the need to store the reference to the FieldInfo when getting or setting a field value in a message, but it does incur a performance penalty due to the lookup in the definition to retrieve the FieldInfo. Notice that there are no longer FieldInfo objects being used when calling addFieldInfo and a lookup is being done for each call to msg.getFieldValueAs* to retrieve the FieldInfo by integer name.
Required and Optional Fields
When adding field information to a definition, you can indicate that the field is optional and may not be set for every message that uses the definition. Do this by passing false as the third parameter to the addFieldInfo function. Using required fields (fixed-required fields specifically) produces the best performance when serializing and deserializing messages, but causes an exception if all required fields are not set before serializing the message. Optional fields allow the concept of sending "null" as a value for a field by simply not setting that field value on the source side before serializing the message. However, after parsing a message, a receiver should check the isFieldValueSet function for an optional field before attempting to read the value from the field to avoid the exception mentioned above.
Fixed String and Fixed Unicode Field Types
A variable length string typically does not have the performance optimizations of fixed-required fields. However, by indicating "required", as well as the field type FIX_STRING or FIX_UNICODE and specifying an integer number of fixed characters, PDM sets aside an appropriate fixed amount of space in the message for that field and treats it as an optimized fixed-required field. Strings of a smaller length can still be set as the value for the field, but the message allocates the specified fixed number of bytes for the string. Specify Unicode strings in the same manner (with FIX_UNICODE as the type) and in "UTF-8" format.
Variable Field Types
The field types of STRING, UNICODE, BLOB, and MESSAGE are all variable length field types. They do not require a length to be specified when adding field info to the definition. You can use a BLOB field to store an arbitrary binary objects (in Java as an array of bytes) and a MESSAGE field to store a PDMMessage object,
which enables "nesting" PDMMessages inside other PDMMessages. Creating and using a variable length string field is nearly identical to the previous fixed string example.
Retrieve the BLOB field values with the getFieldValueAsBlob function, and the MESSAGE field values with the getFieldValueAsMessage function.
Array Field Types
For each of the scalar field types (fixed and variable length), a corresponding array field type uses the convention *_ARR for the type name (ex: BOOLEAN_ARR, INT32_ARR, STRING_ARR, etc.). This lets you set and get Java values such as an int[] or string[] directly into a single field. In addition, all of the array field types can specify a fixed number of elements for the size of the array when they are defined, or if not specified, behave as variable size arrays. Do this by passing an extra parameter to the addFieldInfo function of the definition.
To be treated as a fixed-required field, an array type field must be required as well as be specified as a fixed size array of fixed length elements. For instance, a required BOOLEAN_ARR field defined with a size of 3 would be treated as a fixed-required field. Also, a required FIX_STRING_ARR field defined with a size of 5 and fixed string length of 7 would be treated as a fixed-required field. However, neither a STRING_ARR field nor a BLOB_ARR field are treated as a fixed length field even if the size of the array is specified, since each element of the array can be variable in length. In the example below, field 106 and field 108 are both treated as fixed-required fields, but field 107 is not because it is a variable size array field type.
Definition Included In Message
Optionally, a PDM message can also include the definition when it is serialized to bytes. This enables receivers to parse a PDM message without having pre-defined knowledge of the message, although including the definition with the message affects message size and performance of message deserialization. Notice that the setIncludeDefinition function is called with an argument of true for a source that serializes the definition as part of the message.
For a receiver, the setupPDM function does not need to set any flags for the message but rather should define a message without a definition, since we assume the source provides the definition. If a definition is set for a message, it will attempt to use that definition instead of the definition on the incoming message (unless the ids are different).
The PDM Field Iterator
You can use the PDM Field Iterator to check all defined message fields to see if set, or to extract their values. You can extract a field value as an Object using this method, but due to the casting involved, we recommend you use the type specific get method to extract the exact value. Notice the use of field.isValueSet to check to see if the field value is set and the type specific get methods such as getBooleanValue and getFloatValue.
Sample Output (106, 107, 108 are array objects as expected):
Field set? true Field 100's value is: true Field set? true Field 101's value is: 7 Field set? true Field 102's value is: 3.14 Field set? false Field 103's value is: null Field set? true Field 104's value is: Hello World! Field set? true Field 105's value is: Variable Field set? true Field 106's value is: [Z@527736bd Field set? true Field 107's value is: [I@10aadc97 Field set? true Field 108's value is: [Ljava.lang.String;@4178460d
Using the Definition Cache
The PDM Definition Cache assists with storing and looking up definitions by their id and version. In some scenarios, it may not be desirable to maintain the references to the message and the definition from a setup phase by the application. A source could optionally create the definition during the setup phase and store it in the definition cache. At a later point in time, it could retrieve the definition from the cache and use it to create the message without needing to maintain any references to the objects.
A more advanced use of the PDM Definition Cache is by a receiver which may need to receive messages with different definitions and the definitions are not being included with the messages. The receiver can create the definitions in advance and then set a flag that allows automatic lookup into the definition cache when parsing a message (which is not on by default). Before receiving messages, the receiver should do something similar to createAndStoreDefinition (shown below) to set up definitions and put them in the definition cache. Then the flag to allow automatic lookup should be set as shown below in the call to setTryToLoadDefFromCache(true). This allows the PDMMessage to be created without a definition and still successfully parse a message by leveraging the definition cache.
Applications using SDM with a known set of message fields are good candidates for migrating from SDM to PDM. With SDM, the source typically adds fields to an SDM message without a definition. But, as shown above in the PDM examples, creating/adding a PDM definition before adding field values is fairly straightforward.
However, certain applications may be incapable of building a definition in advance due to the ad-hoc nature of their messaging needs, in which case a self-describing format like SDM may be preferred.
Simple Migration Example
The following source code shows a basic application that serializes and deserializes three fields using SDM and PDM. The setup method in both cases initializes the object instances so they can be reused by the source and receiver methods.
The goal of the sourceCreateMessageWith functions is to produce a byte array by setting field values in a message object. With SDM, actual Field classes are created, values are set, the Field classes are added to a
Fields class, and then the Fields class is added to the SDMessage. With PDM, FieldInfo objects are created during the setup phase and then used to set specific values in the PDMMessage.
The goal of the receiverParseMessageWith functions is to produce a message object by parsing the byte array and then extract the field values from the message. With SDM, the specific field is located and casted to the correct field class before getting the field value. With PDM, the appropriate getFieldValueAs function is called with the corresponding FieldInfo object created during the setup phase to extract the field value.
Notice that with sourceCreateMessageWithSDM function, the three fields (name and value) are created and added to the fset variable, which is then added to the SDM message. On the other hand, the sourceCreateMessageWithPDM function uses the FieldInfo object references to add the field values to the message for each of the three fields.
Also notice that the receiverParseMessageWithSDM requires a cast to the specific field class (like LBMSDMFieldInt8) once the field has been located. After the cast, calling the get method returns the expected value. On the other hand the receiverParseMessageWithPDM uses the FieldInfo object reference to directly retrieve the field value using the appropriate getFieldValueAs* method.
SDM Raw Classes
Several SDM classes with Raw in their name could be used as the value when creating an LBMSDMField. For example, an LBMSDMRawBlob instance could be created from a byte array and then that the LBMSDMRawBlob could be used as the value to a LBMSDMFieldBlob as shown in the following example.
The actual field named "Field103" is created in the try block using the rawSDMBlob variable which has been created to wrap the blob byte array. This field can be added to a LBMSDMFields object, which then uses it in a LBMSDMessage.
In PDM, there are no "Raw" classes that can be created. When setting the value for a field for a message, the appropriate variable type should be passed in as the value. For example, setting the field value for a BLOB field would mean simply passing the byte array directly in the setValue method as shown in the following code snippet since the field is defined as type BLOB.
The PDM types of DECIMAL, TIMESTAMP, and MESSAGE expect a corresponding instance of PDMDecimal, PDMTimestamp, and PDMMessage as the field value when being set in the message so those types do require an instantiation instead of using a native Java type. For example, if "Field103" had been of type PDMFieldType.DECIMAL, the following code would be used to set the value.
There are many use cases where a subscriber application wants to send a message to a publisher application. For example, a client application which subscribes to market data may want to send a refresh request to the publishing feed handler. While this is possible to do with normal sources and receivers, UM supports a streamlined method of doing this.
As of UM version 6.10, a Source String can be used as a destination for sending a unicast immediate message. The UM library will establish a TCP connection to the publisher's context via its UIM port (also known as "request port"). The publishing application can receive this message either from a normal Receiver Object, or from a context immediate message callback via configuration options immediate_message_topic_receiver_function (context) or immediate_message_receiver_function (context) (for topicless messages).
A receiving application's receiver callback function can obtain a source's source string from the message structure. However, that string is not suitable to being passed directly to the unicast immediate message send function.
Here's a code fragment in C for receiving a message from a source, and sending a message back to the originating source. For clarity, error detection and handling code is omitted.
The lbm_msg_t structure supplies the source string, and lbm_unicast_immediate_message() is used to send a topicless immediate message to the source's context. Alternatively, a request message could be sent with lbm_unicast_immediate_request(). If the receive events are delivered without an event queue, then LBM_SRC_NONBLOCK is needed.
The example above uses the LBM_MSG_DATA message type. Most receiver event (message) types also contain a valid source string. Other likely candidates for this use case might be: LBM_MSG_BOS, LBM_MSG_UNRECOVERABLE_LOSS, LBM_MSG_UNRECOVERABLE_LOSS_BURST.
Note that in this example, a topicless message is sent. This requires the publishing application to use the immediate_message_receiver_function (context) option to set up a callback for receipt of topicless immediate messages. Alternatively, a topic name can be supplied to the unicast immediate message function, in which case the publishing application would either create a normal Receiver Object for that topic, or would configure a callback with immediate_message_topic_receiver_function (context).
A Java program obtains the source string via LBMMessage::source(), and sends topicless unicast immediate messages via LBMContext::sendTopicless().
A .NET implementation is essentially the same as Java.
Some subscribing applications need to send a message to the publisher as soon as possible after the publisher is subscribed. Receiver events can sometimes take significant time to be delivered. The source string can be obtained via the source_notification_function (receiver) configuration option. This defines a callback function which is called at the start of the process of subscribing to a source.
Here's a code fragment in C for sending a message to a newly-discovered source. For clarity, error detection and handling code is omitted.
During initialization, when the receiver is defined, the callback must be configured using the lbm_rcv_src_notification_func_t_stct structure:
This creates the Receiver Object with the source notification callback configured. Note that the source notification callback has both a create and a delete function, to facilitate state management by the user.
A Java program configures the source notification callback via com::latencybusters::lbm::LBMReceiverAttributes::setSourceNotificationCallbacks.
A .NET implementation is essentially the same as Java.
In most use cases for sending messages to a source, there is an implicit assumption that a subscribing receiver is fully set up and ready to receive messages from the publisher. However, due to the asynchronous nature of UM, there is no straight-forward way for a receiver to know the earliest point in time when messages sent by the source will be delivered to the receiver. For example, in a routed network (using the DRO), a receiver might deliver BOS to the application, but that just means that the connection to the proper DRO is complete. There could still be delays in the entire end-to-end path being able to deliver messages.
Also, be aware that although unicast immediate messages are delivered via TCP, these messages are not guaranteed. Especially in a routed network, there exists the possibility that a message will fail to reach the publisher.
In most cases, the immediate message is received by the publisher, and by the time the publisher reacts, the end-to-end source-to-receiver path is active. However, in the unlikely event that something goes wrong, a subscribing application should implement a timeout/retry mechanism. This advice is not specific to the "sending to source" use cases, and should be built into any kind of request/response-oriented use case.
UM Spectrum, which refers to a "spectrum of channels", allows the application designer to sub-divide a topic into any number of channels, which can be individually subscribed to by a receiving application. This provides an extra level of message filtering.
The publisher first allocates the desired number of source channel objects using lbm_src_channel_create(). Then it creates a topic source in the normal way. Finally, the application sends messages using lbm_src_send_ex(), specifying the source channel object in the lbm_src_send_ex_info_t's channel_info field.
A receiving application first creates a topic receiver in the normal way. Then it subscribes to channels using lbm_rcv_subscribe_channel() or lbm_wrcv_subscribe_channel(). Since each channel requires a different receiver callback, the receiver application can achieve more granular filtering of messages. Moreover, messages are received in-order across channels since all messages are part of the same topic stream.
It should be noted that a regular topic receiver (one for which no spectrum channels are subscribed) delivers all received messages from a matching spectrum topic source to the receiver's callback without creating the channel_info object.
You can accomplish the same level of filtering with a topic space design that creates separate topics for each channel, however, UM cannot guarantee the delivery of messages from multiple sources/topics in any particular order. Not only can UM Spectrum deliver the messages over many channels in the order they were sent by the source, but it also reduces topic resolution traffic since UM advertises only topics, not channels.
The use of separate callbacks for different channels improves filtering and also relieves the source application of the task of including filtering information in the message data.
Java and .NET performance also receives a boost because messages not of interest can be discarded before they transition to the Java or .NET level.
Spectrum's default behavior delivers messages on any channels the receiver has subscribed to on the callbacks specified when subscribing, and all other messages on the receiver's default callback. This behavior can be changed with the following configuration options.
When an application subscribes to a spectrum channel, it uses the spectrum subscribe API:
Note that when subscribing to a channel, the receiver callback function is optional. If null is supplied as the callback, UM will invoke the underlying receiver's callback.
If a separate callback is supplied for the channel, be aware that only data message event types (LBM_MSG_DATA, LBM_MSG_REQUEST) will be delivered to it. Non-data events (LBM_MSG_BOS, LBM_MSG_EOS, LBM_MSG_UNRECOVERABLE_LOSS, etc.) will be delivered to the underlying receiver's callback.
Smart Sources support Spectrum, but via different API functions. You need to tell UM that you intend to use spectrum at Smart Source creation time using the smart_src_enable_spectrum_channel (source) configuration option. This pre-allocates space in the message header for the spectrum channel.
With Smart Sources, there is no need to allocate a Spectrum source object with lbm_src_channel_create(). Instead, you simply set the LBM_SSRC_SEND_EX_FLAG_CHANNEL flag and the spectrum channel number in the lbm_ssrc_send_ex_info_t passed to the lbm_ssrc_send_ex() API function. It is also usually necessary to tell UM to rebuild the header. For example:
Note that if you are sending multiple messages in a row to the same spectrum channel, you can get a small performance boost by leaving the header alone and code subsequent calls as:
When a Smart Source is created with Spectrum enabled, it is possible to send messages without a Spectrum channel, either by clearing the LBM_SSRC_SEND_EX_FLAG_CHANNEL flag in lbm_ssrc_send_ex_info_t, or by simply not supplying a lbm_ssrc_send_ex_info_t object by passing NULL for the info
parameter. This suppresses all features enabled by that structure.
UM Hot Failover (HF) lets you implement sender redundancy in your applications. You can create multiple HF senders in different UM contexts, or, for even greater resiliency, on separate machines. There is no hard limit to the number of HF sources, and different HF sources can use different transport types.
Hot Failover receivers filter out the duplicate messages and deliver one message to your application. Thus, sources can drop a few messages or even fail completely without causing message loss, as long as the HF receiver receives each message from at least one source.
The following diagram displays Hot Failover operation.
In the figure above, HF sources send copies of Message X. An HF receiver delivers the first copy of Message X it receives to the application, and discards subsequent copies coming from the other sources.
You create Hot Failover sources with lbm_hf_src_create(). This returns a source object with internal state information that lets it send HF messages. You delete HF sources with the lbm_src_delete() function.
HF sources send HF messages via lbm_hf_src_send_ex() or lbm_hf_src_sendv_ex(). These functions take a sequence number, supplied via the exinfo object, that HF receivers use to identify the same message sent from different HF sources. The exinfo has an hf_sequence_number, with a flag (LBM_SRC_SEND_EX_FLAG_HF_32 or LBM_SRC_SEND_EX_FLAG_HF_64) that identifies whether it's a 32- or 64-bit number. Each HF source sends the same message content for a given sequence number, which must be coordinated by your application.
If the source needs to restart its sequence number to an earlier value (e.g. start of day; not needed for normal wraparound), delete and re-create the source and receiver objects. Without re-creating the objects, the receiver sees the smaller sequence number, assumes the data are duplicate, and discards it. In (and only in) cases where this cannot be done, use lbm_hf_src_send_rcv_reset().
Please be aware that non-HF receivers created for an HF topic receive multiple copies of each message. We recommend you establish local conventions regarding the use of HF sources, such as including "HF" in the topic name.
For an example source application, see Example lbmhfsrc.c.
You create HF receivers with lbm_hf_rcv_create(), and delete them using lbm_hf_rcv_delete() and lbm_hf_rcv_delete_ex().
Incoming messages have an hf_sequence_number field containing the sequence number, and a message flag (LBM_MSG_FLAG_HF_32 or LBM_MSG_FLAG_HF_64) noting the bit size.
For the maximum time period to recover lost messages, the HF receiver uses the minimum of the LBT-RM and LBT-RU NAK generation intervals (transport_lbtrm_nak_generation_interval (receiver), transport_lbtru_nak_generation_interval (receiver)). Each transport protocol is configured as normal, but the lost message recovery timer is the minimum of the two settings.
Some lbm_msg_t objects coming from HF receivers may be flagged as having "passed through" the HF receiver. This means that the message has not been ordered with other HF messages. These messages have the LBM_MSG_FLAG_HF_PASS_THROUGH flag set. UM flags messages sent from HF sources using lbm_src_send() in this manner, as do all non-HF sources. Also, UM flags EOS, no source notification, and requests in this manner as well.
For an example receiver application, see Example lbmhfrcv.c.
To create an HF wildcard receiver, set option hf_receiver (wildcard_receiver) to 1, then create a wildcard receiver with lbm_wildcard_rcv_create(). This actually creates individual HF receivers on a per-topic basis, so that each topic can have its own set of HF sequence numbers. Once the HF wildcard receiver detects that all sources for a particular topic are gone it closes the individual topic HF receivers and discards the HF sequence information (unlike a standard HF receiver). You can extend or control the delete timeout period of individual HF receivers with option resolver_no_source_linger_timeout (wildcard_receiver).
For information on implement the HF feature in a Java application, go to UM Java API and see the documentation for classes LBMHotFailoverReceiver and LBMHotFailoverSource.
For information on implement the HF feature in a .NET application, go to UM .NET API and navigate to Namespaces->com.latencybusters.lbm->LBMHotFailoverReceiver and LBMHotFailoverSource.
When implementing Hot Failover with Persistence, you must consider the following impact on hardware resources:
Also note that you must enable Explicit Acknowledgments and "Hot Failover duplicate delivery" (hf_duplicate_delivery (receiver)) in each Hot Failover receiving application.
For detailed information on using Hot Failover with Persistence, see the Knowledge Base article FAQ: Is UMP compatible with Hot Failover?
UM supports intentional gaps in HF message streams. Your HF sources can supply message sequence numbers with number gaps up to 1073741824. HF receivers automatically detect the gaps and consider any missing message sequence numbers as not sent and do not attempt recovery for these missing sequence numbers. See the following example.
HF receiver 1 receives message sequence numbers in order with no pause between any messages: 10, 11, 12, 13, 25, 26, 38
Hot Failover sources can send optional messages that HF receivers can be configured to receive or not receive (hf_optional_messages (receiver)). HF receivers detect an optional message by checking lbm_msg_t.flags for LBM_MSG_FLAG_HF_OPTIONAL. HF sources indicate an optional message by passing LBM_SRC_SEND_EX_FLAG_HF_OPTIONAL in the lbm_src_send_ex_info_t.flags field to lbm_hf_src_send_ex() or lbm_hf_src_sendv_ex(). In the examples below, optional messages appear with an "o" after the sequence number.
HF receiver 1 receives: 10, 11, 12, 13o, 14o, 15, 16o, 17o, 18o, 19o, 20
HF receiver 2, configured to ignore optional messages, receives: 10, 11, 12, 15, 20
An HF receiver takes some of its operating parameters directly from the receive topic attributes. The ordered_delivery (receiver) setting indicates the ordering for the HF receiver.
If you have a receiving application on a multi-homed machine receiving HF messages from HF sources, you can set up the Hot Failover Across Contexts (HFX) feature. This involves setting up a separate UM context to receive HF messages over each NIC and then creating an HFX Object, which drops duplicate HF messages arriving over all contexts. Your receiving application then receives only one copy of each HF message. The HFX feature achieves the same effect across multiple contexts as the normal Hot Failover feature does within a single context.
The following diagram displays Hot Failover operation across UM contexts.
For each context that receives HF messages, create one HFX Receiver per topic. Each HFX Receiver can be configured independently by passing in a UM Receiver attributes object during creation. A unique client data pointer can also be associated with each HFX Receiver. The HFX Object is a special Ultra Messaging object and does not live in any UM context.
Note: You never have to call lbm_topic_lookup() for a HFX Receiver. If you are creating HFX Receivers along with normal UM receivers for the same topic, do not interleave the calls. For example, call lbm_hfx_create() and lbm_hfx_rcv_create() for the topic. Then call lbm_topic_lookup() and lbm_rcv_create() for the topic to create the normal UM receivers.
The following outlines the general procedure for HFX.
Delete each HFX Receiver with lbm_hfx_rcv_delete() or lbm_hfx_rcv_delete_ex(). Delete the HFX Object with lbm_hfx_delete().
The "NAK cutoff" feature allows an application to force its context(s) to stop sending NAKs (LBT-RM only), even if packet loss is detected.
This feature was first added to UM in version 6.16.1 with the C API lbm_context_set_nak_cutoff.
NAK-based network protocols, like LBT-RM and LBT-RU, can be susceptible to NAK storms. UM's algorithms enable the user to control the risk of NAK storms, reducing it close to zero if desired. However, be aware that these algorithms present a trade-off between performance and stability that needs to be considered.
Many users prevent NAK storms by simply provisioning their systems to be able to handle their worst-case load. In this case, there can be no overload (unless the calculation of "worst case" was faulty), and without overload, there can be no NAK storm.
But for some users, designing for theoretical worst-case is not practical, so other strategies must be used. For example, to reduce the chances of a NAK storm close to zero, it might be necessary to establish Rate Controls that will, under periods of stress, limit a publisher's ability to send messages when it is ready to. This is essentially an intentional use of latency to prevent overload.
But again, many of our customers consider those latency costs too high, so they live with increased risk. For example, they may establish rate controls that "overbook" the network's capacity under the assumption that not all publishers will send at their maximum allowed rate at the same time. This is generally a safe assumption, but still leaves open the possibility of overload.
UM version 6.16.1 introduced a feature called the NAK Cutoff. This is an API that allows an application to force its context(s) to stop sending NAKs, even if packet loss is detected. This API may be useful to users who cannot reduce the changes of NAK storms low enough, and want a contingency for the unlikely event that a NAK storm does happen. Note that employing this feature will result in some amount of Unrecoverable Loss.
If the operations team concludes that they are having a NAK storm, they can command their applications to invoke the NAK cutoff feature (call the API) to disable NAKS long enough to allow the network to stabilize.
When applications send NAKs, that is typically not a NAK storm. NAKs are simply part of the UM lost packet recovery protocols. But in a worst-case scenario, the increased load of the lost-packet recovery protocols can cause additional packet loss, leading to a self-reinforcing feedback loop where loss triggers loss recovery, which causes additional loss, which triggers more loss recovery, etc.
This self-reinforcing feedback loop is a NAK storm. If not controlled, they can cripple a network for minutes or even hours, potentially requiring a shutdown / restart of the core parts of your system.
See NAK Storms for detailed explanation of NAK storm causes, prevention, and avoidance.
You can reduce the chances of a NAK storm by avoiding packet loss. Since the vast majority of packet loss is caused by overload, the basic methods of avoiding packet loss are to maximize the efficiency of data consumers and to control the sending of messages by data sources.
See Packet Loss for detail and loss avoidance strategies.
See Configuring UDP-Based Transports for specific UM configuration advice.
The Persistence Store daemon and the DRO daemon each have a simple web server which provides operational information. This information is important for monitoring the operation and performance of these daemons. However, while the web-based presentation is convenient for manual, on-demand monitoring, it is not suitable for automated collection and recording of operational information for historical analysis.
(The UMDS product also supports Daemon Statistics as of UMDS version 6.12; see UMDS documentation for details.)
Starting with UM version 6.11, a feature called "Daemon Statistics" has been added to the Store and DRO daemons. The Stateful Resolver Service (SRS), added in UM version 6.12, supports Daemon Statistics only (no web server). The Daemon Statistics feature supports the background publishing of their operational information via UM messages. Monitoring systems can now subscribe to this information in much the same way that UM transport statistics can be subscribed.
While the information published by the Store, DRO, and SRS daemon statistics differ in their content, the general feature usage is the same between them. When the feature is configured, the daemon will periodically collect and publish its operational information.
The following sections give general information which is common across daemons, followed by links to daemon-specific details.
With the introduction of the Daemon Statistics feature, a new context is added to the daemons: the Daemon Controller. This context publishes the statistics and also can be configured to accept daemon control requests from external applications. These control requests are primarily used for controlling the Daemon Statistics feature (see Daemon Control Requests), but are also used for a few daemon-specific control functions that are unrelated to Daemon Statistics (for example, Request: Mark Stored Message Invalid).
Note that in every UM component that supports the Daemon Statistics feature, the Daemon Controller defaults to disabled. Each component's Daemon Statistics configuration must be set to enable the function of the Daemon Controller.
The operational information is published as messages of different types sent over a normal UM topic source (topic name configurable). For the Store and DRO daemons, each message is in the form of a binary, C-style data structure.
There are generally two categories of messages: config and stats. A given instance of a category "config" message does not have content which changes over time. An instance of a category "stats" message has content that does change over time. The daemon-specific documentation indicates which messages are in which category.
Each message type is configured for a publishing interval. When the publishing interval for a message type expires, the possible messages are checked to see if its content has materially changed since the last interval. If not, then the message is not republished. The publishing interval for a stat message is typically set to shorter periods to see those changes as they occur.
Note that the SRS message format is JSON, and therefore the granularity of published data is finer. I.e. a given message type might be published, but only a subset of the fields within the message might be included. In contrast, the daemons which publish binary structures send the structures complete.
Finally, note that while the contents of a given instance of a config message does not change over time, new instances of the message type can be sent as a result of state changes. For example, a new instance of umestore_repo_dmon_config_msg_t is published each time a new source registers with the Store.
More detailed information is available in the daemon-specific documentation referenced below.
For the Store and DRO daemons, the messages published are in binary form and map onto the C data structures defined for each message type.
For the SRS service, the messages are formatted as JSON, so this section does not apply to the SRS.
The byte order of the structure fields is defined as the host endian architecture of the publishing daemon. Thus, if a monitoring host receiving the messages has the same endian architecture, the binary structures can be used directly. If the monitoring host has the opposite endian architecture, the receiver must byte-swap the fields.
The message structure is designed to make it possible for a monitoring application to detect a mismatch in endian architecture. Detection and byte swapping is demonstrated with daemon-specific example monitoring applications.
More detailed information is available in the daemon-specific documentation referenced below.
For the Store and DRO daemons, each message sent by the daemon consists of a standard header followed by a message-type-specific set of fields. The standard header contains a version
field which identifies the version of the C include file used to build the daemon.
For the SRS service, the messages are formatted as JSON, so this section does not apply to the SRS.
For example, the Store daemon is built with the include file umedmonmsgs.h
. With each daemon statistics message sent by the Store daemon, it sets the header version field to LBM_UMESTORE_DMON_VERSION. With each new release of the UM package, if that include file changes in a substantive way, the value of LBM_UMESTORE_DMON_VERSION is increased. In this way, a monitoring application can determine if it is receiving messages from a Store daemon whose data structures match the monitoring application's structure definitions.
More detailed information is available in the daemon-specific documentation referenced below.
Each daemon publishing binary daemon stats can optionally be configured to accept command-and-control requests from administrative applications. These command-and-control requests are handled by the daemon's Daemon Controller, a context added to support Daemon Statistics.
There are different categories of these requests. All daemons have in common categories "snapshot" and "config", which are related to Daemon Statistics. Other categories are specific to the daemon type.
"Snapshot" requests tell the daemon to immediately republish the desired stats and/or configs without waiting until the next publishing interval. These requests might be sent by a monitoring application which has only just started running and needs a full snapshot of the operational information.
"Config" requests tell the daemon to modify an operational parameter of the running daemon.
An application sends a request to the daemon, and the daemon sends status messages in response. The exchanges are made via standard UM topicless immediate Request/Response messaging. These requests should be sent using the Unicast Immediate Messaging (UIM) API for sending the requests using lbm_unicast_immediate_request(). See Unicast Immediate Messaging for details on UIM.
To use UIM effectively, Informatica recommends configuring the daemon monitor context for a specific UIM interface and port using: request_tcp_port (context) and request_tcp_interface (context). This enables the monitoring application to know how to address the request UIMs to the proper daemon.
For the Store and DRO daemons, The request message is formatted as a simple ASCII string. For the SRS service, the request message is formatted as a JSON message. The request is sent as a topicless unicast immediate request message. The daemon reacts by parsing the request and sending a UM response with a success/failure response. If the request was parsed successfully, the daemon then performs the requested operation (republishing the data or modifying the operational parameter). There are daemon-specific example applications which demonstrate the use of this request feature.
More detailed information is available in the daemon-specific documentation referenced below.
UM Daemon Statistics are implemented using normal UM messaging. In particular, the Daemon Controller capability allows a remote application to send requests to a daemon that modify its behavior. If misused, these behaviors can be disruptive to normal operation. For example, the "umedmon" program can instruct a persistent Store to flag stored messages as invalid, which would prevent their delivery to a recovering receiver.
Note that the UM daemons default to rejecting these command-and-control messages, so taking special security precautions is only necessary if you have configured daemons to enable the Daemon Control requests.
One common way to prevent unauthorized use is to tightly control access to your production network so that no unauthorized users can accidentally or maliciously use Daemon Control requests to interfere with normal operation.
Additionally, you can configure the Daemon Control context for encryption, which supports certificate-based access control. This requires the use of an encrypted TRD for Daemon Statistics. If your normal message data is unencrypted, you will need to define one or more new TRDs for Daemon Statistics that are separate from the normal TRDs (an encrypted TRD must not have unencrypted contexts assigned to it).
Use the use_tls (context) configuration option in the UM configuration file you supply for the Daemon Statistics. For example, for the Store Daemon use the UMP Element "<lbm-config>" contained within the UMP Element "<daemon-monitor>".
Note that the use of an encrypted TRD will require that the Daemon Statistics data be configured for the TCP transport.
Since UM's encryption feature is certificate-based. A user wanting to use a Daemon Control request tool must have access to the proper certificate file(s). This means that unauthorized users must not have access to the certificate file(s).
Finally, be aware that a partially-encrypted DRO network can break the security of an encrypted TRD; see TLS and the DRO.
See Encrypted TCP for details on using encryption.
For details on the persistent Store's daemon statistics feature, see Store Binary Daemon Statistics.
For details on the DRO's daemon statistics feature, see DRO Binary Daemon Statistics.
For details on the SRS's daemon statistics feature, see SRS Daemon Statistics.