31. Release LBM 4.1.1 / UME 3.1.1 / UMQ 1.1.1 - October 2010

31.1. LBM 4.1.1

31.1.1. Updated Features

  • Improved the efficiency of Late Join retransmissions with the addition of a timer for every request set to the same value as the retransmit_request_interval. Previously, Late Join re-requested retransmissions if the retransmission was not received at the expiration of the retransmit_request_interval. Now Late Join only re-requests a retransmission when the first request's retransmit_request_interval expires. This change also applies to retransmission requests from a UME store.

  • The UM Gateway now supports single connection TCP Peer Portals.

  • Added the <nodelay/> element to the Gateway configuration for peer portals. This sets the TCP_NODELAY option for the peer TCP connections, disabling Nagle's algorithm. By default, TCP_NODELAY is not set.

31.1.2. Bug Fixes

  • Fixed an issue that could cause some source events for Hot Failover sources in the .NET API to not be delivered to the source's source event callback for a brief period of time after source creation.

  • Corrected a problem with LBT-IPC timers that caused the log warning: THROTTLED MSG: timer scheduled <= MIN_CLOCK_RES_MSEC (2 ms) [1282330517.878809]: Rescheduling for 3 ms. This problem occurred if transport_lbtipc_behavior was set to receiver_paced and the source was slowed by a receiver that could not keep up to the data stream.

  • Corrected an issue with lbmj library versioning for AIX. The Ultra Messaging® Java API now runs on AIX.

  • Corrected a race condition with the LBT-RDMA transport that could result in a seg fault if a receiver was connecting/disconnecting at the same time that the source was being deleted.

  • Corrected a problem with the LBT-RDMA transport that caused a seg fault if you increased implicit_batching_minimum_length from the default value.

  • Corrected a problem that caused a seg fault when sending messages via the LBT-RDMA transport that are larger than approximately 3,800 bytes or less if the transport_lbtrdma_datagram_max_size has been reduced from the default value.

  • Fixed an issue with automatic monitoring on Apple OSX that caused the process name to be set to unknown.

31.1.3. Known Issues

  • When using Event Queues with the Java API on Mac OS X kernel 9.4, core dumps have occurred. Mac OS X kernel versions prior to 9.4 have not produced this behavior. 29West is investigating this issue.

  • When using LBT-IPC, a seg fault can occur when sending messages larger than 65,535 bytes when ordered_delivery has been set to 0 (zero). The seg fault occurs when fragments are lost. Setting transport_lbtipc_behavior to receiver_paced avoids the seg fault by eliminating loss. 29West is investigating this issue.

  • When using the LBT-RDMA transport with Java applications, a segfault can occur if you kill a receiver with Ctrl-C. As a workaround, use the JVM option, -Xrs. 29West is investigating this problem.

  • If you use the current version of VMS (3.2.8), LBM 4.1 issues the following warning: LOG Level 5: LBT-RDMA: VMS Logger Message (Error): vmss_create_store: 196[E] vms_listen: rdma_bind_addr failed (r=-1). This warning indicates that rdma_bind failed for ethernet interfaces, which is expected behavior. Currently, VMS attempts rdma_bind on all interfaces. When released, VMS version 3.2.9 will only run rdma_bind on infiniband-capable interfaces.

31.2. UME 3.1.1

31.2.1. New Features

  • Added a flight size mechanism that tracks messages in flight from a particular source and responds when a send would exceed the configured ume_flight_size. You can configure ume_flight_size_behavior to either block any sends that would exceed the flight size or, allow the sends while notifying your application. For more see, UME Flight Size in UME Normal Operation.

  • Added lbm_ume_src_msg_stable() which can mark a sequence number as stable. This may trigger a source event notification, if configured to do so and also adjusts the current number of inflight messages.

31.2.2. Updated Features

  • Modified the receiver-new-registration-rollback store option's default to zero (0), which now requests no message recovery for newly registered receivers. Previously, zero (0) recovered the single latest message. Similarly, a value of 1 now recovers 1 message, whereas previously a value of 1 recovered 2 messages.

    Configuration Change Required: Reduce this option's value by 1.

    Application Change Required: If you use lbm_ume_rcv_recovery_info_ex_function_cb() in any of your applications, increase the low_sequence_number by one.

  • Added all-active as a valid option for both ume_retention_intergroup_stability_behavior and ume_retention_intragroup_stability_behavior. An active store is defined as a registered store. A group is considered active if it has at least a quorum of active or registered stores. Intergroup stability requires at least one stable group.

  • UME now supports a Quorum/Consensus group size of 1.

  • Added two new store configuration options for separating the configuration of read async IO callbacks from and write async IO callbacks. See repository-disk-max-write-async-cbs and repository-disk-max-read-async-cbs in Options for a Topic's ume-attributes Element.

  • Added the ability to track message stability and/or consumption on a per send basis as opposed to a per fragment basis by adding additional option values to ume_message_stability_notification and ume_confirmed_delivery_notification. Values 0 (zero) and 1 retain existing behavior. Using a value of 2 provides only a single notification to the source of stability or delivery for an entire message. A value of 3 provides notification of stability or delivery for every fragment or message with the flag WHOLE_MESSAGE_STABLE or WHOLE_MESSAGE_CONFIRMED set for the last fragment of a message.

  • Modified umestored to greatly increase the rate at which stores handle retransmission requests. Retransmissions are now sent on a different context from the store context thread.

  • Changed the IP:port used to send stability ACKs when using UME across the UM Gateway. UME now uses the IP:port specified in the topic advertisement, rather than the IP:port specified in the persistent registration. This change only affects deployments in which

    1. The source and store are on different sides of a gateway.

    2. Multiple paths exist between the source and store via multiple parallel gateways.

    3. Due to ACL restrictions, the persistent registration and topic data are forwarded on different gateways.



31.2.3. Bug Fixes

  • Fixed a problem that caused a segfault if umestored was configured to use sequential mode.

  • Corrected a problem that caused proxy source creation to fail in UME stores.

  • Fixed an issue where a persistent registration timer could be scheduled repeatedly on an error condition, which caused rapid memory growth and CPU usage.

  • Fixed an issue that caused a segfault if a receiver attempted a very large recovery that completed quickly.

  • Corrected an issue in the Gateway concerning multiple parallel peer portals on the same gateway which connect to corresponding peer portals on another gateway. Multiple delivery confirmations could be received for a single message due to problems with how ACLs on the peer portals operated.

  • Changed the value of optlen to 0 (zero) if no UME store was configured with either a configuration file or lbm_src_topic_attr_str_getopt() and you called lbm_src_topic_attr_str_getopt(). Previously in this case, optlen was set to -1.

  • Added a new state , UNRESOLVED, for stores configured with only a store-name and without an IP:port. Previously in this case, UME delivered a LBM_SRC_EVENT_UME_STORE_UNRESPONSIVE event to the source application. The event text now indicates that the store is UNRESOLVED, rather than unresponsive.

  • Corrected an issue with UME sources referencing stores by name that have been become unresponsive and have been configured in the Round Robin fashion. Once the store restarted, re-registration did not occur. This condition only applies when the source and store reside on opposite sides of a gateway.

31.2.4. Known Issues

  • Receivers using event queues and Spectrum with UME can experience a SIGSEGV while shutting down if events still exist on the event queue when it is deleted. As a workaround, use LBM_EVQ_BLOCK when dispatching event queues. During application shutdown, call lbm_evq_unblock() after deleting receivers associated with the event queue, but before deleting any context objects. Once the dispatch thread exits, it is safe to proceed with context deletion. 29West is working on a solution to this problem.

31.3. UMQ 1.1.1

31.3.1. New Features

  • Added a flight size mechanism for both UMQ and ULB that tracks messages in flight from a particular source and responds when a send would exceed the configured flight size. You can configure the UMQ flight size behavior for sources or Multicast Immediate Messaging to either block any sends that would exceed the flight size or, allow the sends while notifying your application. For more see, UMQ Flight Size and Ultra Load Balancing Flight Size in Ultra Load Balancing Operations.

  • Added lbm_umq_ctx_msg_stable() which can mark a message ID as stable. This may trigger a source event notification, if configured to do so and also adjusts the current number of inflight messages.

Copyright (c) 2004 - 2014 Informatica Corporation. All rights reserved.