Guide for Persistence
Persistence Architecture

As shown in the diagram, UM provides messaging functionality as well as persistent operation.

persistent_architecture.png

The highlights of this architecture are:

  • Sources communicate with Store Instances
  • Receivers communicate with Store Instances
  • Sources communicate with receivers

Note that the Store is not supported on all platforms. For example, while Solaris supports persistent clients (source and receiver), you cannot run a Store on an Solaris system. However, an Solaris-based client can interoperate with a store running an any other supported platform.


Persistent Store Architecture  <-

The umestored daemon runs the persistent Store Process. You can configure multiple Store Instances per Store Process using the UMP Element "<store>" in the Store's XML configuration file. See Configuration Reference for Umestored. Individual Store Instances can use separate disk cache and disk state directories and be configured to persist messages for multiple sources (topics), which are referred to as "source repositories". Each Store Process has an optional Web Monitor for statistics monitoring. See Store Web Monitor.

store_architecture.png


Store Processes and Instances  <-

When the Store daemon is started on a host, the process is known as the "Store Process". That Store Process contains one or more "Store Instances". A Store Instance is an independent, addressable, and configurable component. Each Store Instance is implemented with a set of interacting program threads. The threads of one Instance do not interact or contend with the threads of other Instances in the same Process.

There is very little difference between running one Store Process with two Store Instances compared to two Store Processes with one Store Instance each. They function and perform mostly the same. The reasons for choosing one over the other have mostly to do with operational convenience. For example, running fewer processes on a host is sometimes easier to manage. So operational simplicity suggests combining multiple Store Instances into a single Store Process.

On the other hand, there are times when it is desired to shut down a Store Instance. But Store Instances cannot be shut down individually; an entire Store Process must be shut down. For example: as message rates increase, you may find that the host's CPU consumption is getting too high. You might want to migrate half of the Store Instances to a different host. But if all your Store Instances are in one Store Process, it is more disruptive perform the migration since it requires shutting down the entire process and re-configuring. So operational flexibility suggests assigning each desired Store Instance to its own Store process.

One specific case where a single Store Process with multiple Store Instances is generally preferred: using the Store Daemon as a Windows Service. There is no simple way to run multiple copies of the Store Windows Service.


Source Repositories  <-

Within a Store Instance, you configure repositories for individual topics, and each can have their own set of <topic> options that affect the repository's type, size, liveness behavior and much more. If you have multiple sources sending on the same topic, the Store Instance creates a separate repository for each source. UM uses the repository options configured for the topic to apply to each source's repository. If you specify 48MB for the size of the repository and have 10 sources sending on the topic, the Store Instance requires 480MB of storage for that topic.

A repository can be configured as one of the following types:

  • memory - the repository maintain both state and data only in memory
  • disk - the repository maintains state and data on disk, but also uses a memory cache.

There are also repository types called "reduced-fd" and "no-cache", which are deprecated and will be removed in a future UM version. The "reduced-fd" repository is similar to "disk" but uses fewer OS File Descriptors. However, it is deprecated due low performance. The "no-cache" repository maintains state (last sequence numbers published and consumed) but does not maintain message content. It is deprecated due to lack of compelling use cases.

Note that the Store Instances within a Store Process can have different repository types.


Repository Thresholds and Limits  <-

Repositories are designed as circular buffers. When age or size thresholds are met for a topic, the repository removes or overwrites messages in order to prevent reaching its configured limit, which keeps space available for new messages. UM provides UM configuration options and store configuration options to control threshold and limit behavior.

UM configuration options control source repositories for all the sources sending within the context. The default for these options, listed below, are 0 (zero) which makes the like-name option for the repository in the umestored XML configuration file active.

See Ultra Messaging Persistence Options.

Note: The above configuration options' default values can be altered for individual sources and receivers by calling lbm_src_topic_attr_setopt() before you allocate the topic.

The umestored configuration options for source/topic repositories explained below can also be used to control threshold and limit behavior. See Options for a Topic's ume-attributes Element for complete information about the following repository options.

Note
Whether you use the UM configuration options mentioned above or the source repository options explained below to control source repository threshold and limit behavior, remember the values you configure apply to a single source sending to the Store Instance. If you use the default repository size limit of 48 MB and you have 1,000 sources sending to the Store Instance, UM creates a store with 1,000 source repositories of 48 MB each, which requires a store with approximately 48 GB of memory. And if you use the default disk file size limit of 100 MB and you have 1,000 sources sending to the store, UM creates a store with 1,000 source repositories of 100 MB each, which requires a store with disk storage capacity of approximately 100 GB.

Memory Repository

A memory type source repository has three configuration options that manage its size relative to its capacity.

  • repository-age-threshold - This value determines how long the repository retains messages. The repository deletes any message older than this configured value.

  • repository-size-threshold - The size in bytes that a repository can reach before it begins to delete the oldest retained messages. If the repository size falls below the threshold, it stops deleting old messages.

  • repository-size-limit - The maximum size in bytes for the repository. Once this limit is reached, the repository stops accepting new messages. The age and size thresholds should be set at levels that guarantee the size limit is never met. You should consider how fast the source sends messages, the size of the messages and the reliability of the receivers. For example, more reliable receivers mean less recovery instances, which could mean a younger age threshold.

Disk Repositories

A disk type source repository maintains a memory cache in addition to the actual disk storage. It continually persists messages from the memory cache to the disk, and uses the memory cache for receiver recovery first before performing disk reads to access needed messages. It has four configuration options that manage its size relative to its capacity.

  • repository-age-threshold - This value determines how long the disk repository retains messages in its memory cache. The repository deletes any message from memory cache older than this configured value. These messages could have been persisted to disk and may be available for recovery.

  • repository-size-threshold - The size in bytes that a repository can reach before it begins to delete the oldest retained messages. These messages could have been persisted to disk and may be available for recovery. If the disk repository memory cache size falls below the threshold, it stops deleting old messages.

  • repository-size-limit - The maximum size in bytes for the disk repository's memory cache. Once this limit is reached, the repository stops accepting new messages. The age and size thresholds should be set at levels that guarantee the size limit is never met. You should consider how fast the source sends messages, the size of the messages and the reliability of the receivers. For example, more reliable receivers mean less recovery instances, which could mean a younger age threshold.

  • repository-disk-file-size-limit - The maximum disk space (in bytes) for the disk repository. Once this limit is reached, the repository overwrites old messages with new messages. Overwriting old messages is not necessarily a negative situation provided you disk file size is adequate. However, if messages needed for recovery are not in either the memory cache or the disk file, you may need to increase the disk file size to ensure that overwritten messages are no longer needed for receiver recovery.


Tolerance Persistent Store Fault Tolerance  <-

Sources and receivers register with a Store Instance and use individual repositories within the Store. Sources can use redundant repositories configured in multiple Stores Instances in Quorum/Consensus (Q/C) arrangement for fault tolerance. Be aware that the arrangement of Store Instances into Quorum/Consensus groups is a function of the source. I.e. the individual stores of a Q/C group are not aware of each other and do not coordinate their activities.

Informatica strongly recommends that the Store Instances of a Q/C group run on separate physical hosts.


Identifying Persistent Stores  <-

You can identify Store Instances with either a domainID:interface:port, interface:port or a name. Using only interface:port is more feasible in smaller implementations where the smaller number of possible IP addresses is easier to manage. Larger implementations, especially those that span topic resolution domains using DROs, are better served with Stores identified by a name or domainID:interface:port.

UM automatically resolves and maintains a mapping between a store name and a single topic resolution domain, IP address and port. UM also automatically resolves store names if the store is located across one or more DROs in a different topic resolution domain.

The following lists other specifics of store identification.

  • Store sends ads at startup and in response to queries from sources.
  • If a store receives a context name advertisement that matches its own store name, umestored issues a warning in the store's log.
  • Sources using named stores issue an information message to the application every time a resolved context name changes its DomainID:IPaddress:port.

Using a Single Interface and Port

Configure Store Instance for a single interface and port.

  1. Identify the store with only the interface:port, specified with the UMP Element "<store>" in the Store's configuration file.

    <store name="newyork-1" port="14567" interface="10.29.3.16">

  2. Add the interface:port to ume_store (source) so sources can find and register with the Store Instance.
    source ume_store 10.29.3.16:14567
    

To run the Store Instance on a different machine for any reason, you must change both the umestored XML configuration file and the UM configuration file.

Using a Range of Interfaces

Configure a store with a range of IP addresses.

  1. Identify the store with a range of interfaces specified in the umestored configuration file.

    <store name="newyork-1" port="14567" interface="10.29.3.0/24">`

  2. Add the active interface to ume_store (source) so sources can find and register with the store. You can only specify one interface in the configuration file.
    source ume_store 10.29.3.16:14567
    

To run the store on a different machine, you must only change the interface specified in the ume_store (source) UM configuration option, provided you use one of the interfaces in the range specified in in the umestored configuration file.

Using a Store (context) Name

Configure a store with a name instead of just IP:port. '0.0.0.0' (INADDR_ANY) or no value is the default for the store's interface attribute.

  1. Identify the store with a context-name option that resolves to the interface and port - or range of interfaces and port - specified in the umestored configuration file:

    <store name="newyork-1" port="14567" interface="0.0.0.0">
    <ume-attributes>
    <option name="context-name" type="store" value="NEWYORK-1"/>
    </ume-attributes>

    OR

    <store name="newyork-1" port="14567" interface="10.29.3.16">
    <ume-attributes>
    <option name="context-name" type="store" value="NEWYORK-1"/>
    </ume-attributes>

    OR

    <store name="newyork-1" port="14567" interface="10.29.3.0/24">
    <ume-attributes>
    <option name="context-name" type="store" value="NEWYORK-1"/>
    </ume-attributes>

  2. Add the store's context name to ume_store_name (source) so sources can find and register with the store.

    source ume_store_name NEWYORK-1
    

You do not have to make any configuration changes to run NEWYORK-1 on another machine, provided the new interface matches one of those specified in the umestored configuration file. This includes running the Store Instance in a different topic resolution domain.