Guide for Persistence
Persistence Architecture

As shown in the diagram, UM provides messaging functionality as well as persistent operation.

persistent_architecture.png

The highlights of this architecture are:

  • Sources communicate with Store instances
  • Receivers communicate with Store instances
  • Sources communicate with receivers

Note that the Store is not supported on all platforms. For example, while Solaris supports persistent clients (source and receiver), you cannot run a Store on an Solaris system. However, an Solaris-based client can interoperate with a Store running an any other supported platform.


Persistent Store Architecture  <-

The umestored program (the final "d" stands for "daemon") runs the Store Process. You can configure multiple Store instances per Store Process using the UMP Element "<store>" in the Store configuration file. See Configuration Reference for Umestored. Individual Store instances can use separate disk cache and disk state directories and be configured to persist messages for multiple sources (topics), which are referred to as "source repositories". Each Store Process has an optional Web Monitor for statistics monitoring. See Store Web Monitor.

store_architecture.png


Store Processes and Instances  <-

When the Store Process is started on a host, the process is known as the "Store Process". That Store Process contains one or more "Store instances". A Store instance is an independent, addressable, and configurable component. Each Store instance is implemented with a set of interacting program threads. The threads of one Instance do not interact or contend with the threads of other Instances in the same Process.

There is very little difference between running one Store Process with two Store instances compared to two Store Processes with one Store instance each. They function and perform mostly the same. The reasons for choosing one over the other have mostly to do with operational convenience. For example, running fewer processes on a host is sometimes easier to manage. So operational simplicity suggests combining multiple Store instances into a single Store Process.

On the other hand, there are times when it is desired to shut down a Store instance. But Store instances cannot be shut down individually; an entire Store Process must be shut down. For example: as message rates increase, you may find that the host's CPU consumption is getting too high. You might want to migrate half of the Store instances to a different host. But if all your Store instances are in one Store Process, it is more disruptive perform the migration since it requires shutting down the entire process and re-configuring. So operational flexibility suggests assigning each desired Store instance to its own Store Process.

One specific case where a single Store Process with multiple Store Instances is generally preferred: using the Store Process as a Windows Service. There is no simple way to run multiple copies of the Store Windows Service.


Source Repositories  <-

Within a Store instance, you configure repositories for individual topics, and each can have its own set of <topic> options that affect the repository's type, size, liveness behavior, among other options. If you have multiple sources sending on the same topic, the Store instance creates a separate repository for each source. UM uses the repository options configured for the topic to apply to each source's repository.

For example, if you specify 48 MB for the size of the repository and have 10 sources sending on the topic, the Store instance requires 480 MB of storage for that topic.

A repository can be configured as one of the following types:

  • memory - the repository maintain both state and data only in memory, not disk.
  • disk - the repository maintains state and data on disk, but also uses a memory cache.

There is also a repository type called "no-cache", which is deprecated and will be removed in a future UM version. The "no-cache" repository maintains state (last sequence numbers published and consumed) but does not maintain message content. It is deprecated due to lack of compelling use cases.

Note that the Store instances within a Store Process can have different repository types.


Repository Thresholds and Limits  <-

The Store is designed to retain messages in case they are needed for future recovery. Of course, it is not possible to extend this retention to infinity, so the Store must be configured with policies regarding the removal of messages. There are three possible policies that can be established:

  1. Repository size limit (required)
  2. Message age limit (optional, for source-paced persistence).
  3. Consumption by all required receivers (for receiver-paced persistence).

In all cases, the repository's size is limited to prevent exhaustion of storage. With source-paced persistence, when the repository size limit is reached, the oldest messages are overwritten by new messages. With receiver-paced persistence, when the repository size limit is reached, the source is prevented from sending more messages.

For receiver-paced persistence, when all required receivers acknowledge consumption of a message, it is removed from the Store. But note that if the required receivers to not acknowledge consumption of messages and the repository fills before the oldest messages are acknowledged, the repository size is enforced and the source is blocked from sending more messages.

Note
When you configure a size limit, it applies to a single persisted source. For example, if you have two publishing applications and each one sends to the topic "EventStream", two separate repositories will be created, one for each source, and each repository will be allowed to grow to the configured size limit. You must provision your repository sizes with knowledge of how many persisted sources will be serviced by a given Store.

The size configuration options differ depending on whether you are implementing a Memory Repository Store (repository-type "memory") or a Disk Repository Store (repository-type "disk").

Memory Repository Size

A memory type source repository has three configuration options that manage its size relative to its capacity.

Note that the design of UM's persistence allows a maximum of 2,147,483,647 messages (2**31 - 1) to be persisted.

  • repository-age-threshold - This value determines how long the memory repository retains messages. Messages in memory that exceed this time can be deleted from the memory cache.

  • repository-size-threshold - The size in bytes that a repository can reach before it begins to delete the oldest retained messages. If the repository size falls below the threshold, it stops deleting old messages.

  • repository-size-limit - The maximum size in bytes for the repository. Once this limit is reached, the repository stops accepting new messages. The age and size thresholds should be set at levels that guarantee the size limit is never met. You should consider how fast the source sends messages, the size of the messages and the reliability of the receivers. For example, more reliable receivers mean less recovery instances, which could mean a younger age threshold. Do not specify a limit that would allow more than 2,147,483,647 messages to be stored.

Disk Repository Size

A disk type source repository maintains a memory cache in addition to the actual disk storage. It continually persists messages from the memory cache to the disk, and uses the memory cache for receiver recovery first before performing disk reads to access needed messages.

Note that the design of UM's persistence allows a maximum of 2,147,483,647 messages (2**31 - 1) to be persisted.

The Store has four configuration options that manage its size relative to its capacity.

  • repository-size-threshold - The size in bytes that a repository can reach before it begins to delete the oldest retained messages. These messages could have been persisted to disk and may be available for recovery. If the disk repository memory cache size falls below the threshold, it stops deleting old messages.

  • repository-size-limit - The maximum size in bytes for the disk repository's memory cache. Once this limit is reached, the repository stops accepting new messages. The age and size thresholds should be set at levels that guarantee the size limit is never met. You should consider how fast the source sends messages, the size of the messages and the reliability of the receivers. For example, more reliable receivers mean less recovery instances, which could mean a younger age threshold. Do not specify a limit that would allow more than 2,147,483,647 messages to be stored.

  • repository-disk-file-size-limit - The maximum disk space (in bytes) for the disk repository. Once this limit is reached, the repository overwrites old messages with new messages. Overwriting old messages is not necessarily a negative situation provided you disk file size is adequate. However, if messages needed for recovery are not in either the memory cache or the disk file, you may need to increase the disk file size to ensure that overwritten messages are no longer needed for receiver recovery. Do not specify a limit that would allow more than 2,147,483,647 messages to be stored.


Persistent Store Fault Tolerance  <-

Sources and receivers register with a Store instance and use individual repositories within the Store. Sources can use redundant repositories configured in multiple Stores Instances in Quorum/Consensus (QC) arrangement for fault tolerance. Be aware that the arrangement of Store instances into Quorum/Consensus groups is a function of the source. I.e. the individual Stores of a QC group are not aware of each other and do not coordinate their activities.

Informatica strongly recommends that the Store instances of a QC group run on separate physical hosts.


Identifying Persistent Stores  <-

A persistent source must be configured to identify one or more Stores to provide persistence services. The source configuration can identify Store instances with one of:

In either case, the store should be told which interface to bind to, which defines its IP address. This is done with the UMP Element "<store>" in the Store's configuration file. There is a shortcut available to simplify the Store configuration file; see Using a CIDR Range of IP Addresses.

Store Address: Identify Store with IP:Port

Using IP:port is feasible in deployments where there is no DRO; i.e. all components are in a single Topic Resolution Domain (TRD). Deployments that include DRO and have multiple TRDs require that the domain ID be added to the address: domainID:IP:port.

Configure Store instance for a single IP:port.

  1. Identify the Store with only the IP:port, specified with the UMP Element "<store>" in the Store's configuration file. For example:

    <store name="newyork-1" port="14567" interface="10.29.3.16">

  2. Configure the source with the IP:port using the LBM configuration option ume_store (source) so sources can find and register with the Store instance.
    source ume_store 10.29.3.16:14567
    

Named Stores: Identify Store with Context Name

A Store is configured with a context name. Sources are then configured to specify the Stores by their names, instead of their IP:Port.

  1. Give the Store's context a name using the context-name option in the Store configuration file. For example:

    <store name="newyork-1" port="14567" interface="10.29.3.0/24">
    <ume-attributes>
    <option type="store" name="context-name" value="NEWYORK-1"/>
    </ume-attributes>

  2. Configure the source with the name of the Store's context using the LBM configuration option ume_store_name (source) so sources can find and register with the Store.

    source ume_store_name NEWYORK-1
    

    Note that you did not have to determine the full IP address of the store's host.

Store context names can be used with or without DROs. UM automatically resolves and maintains a mapping between a Store's context name and its domain ID, IP address and port, as follows:

  • Store advertises its context name at startup and in response to queries from sources.
  • If a Store receives a context name advertisement that matches its own context name, that Store issues a warning in the Store's log. This represents an invalid configuration and can produce unpredictable results. Always ensure that Store context names are unique within a UM deployment.
  • Sources using Store context names issue an information message to the application every time a resolved context name changes its DomainID:IPaddress:port.

Shortcut: Using a CIDR Range of IP Addresses

Configure a Store with a CIDR range of IP addresses (see Specifying Interfaces). This allows multiple Store daemon instances which only differ by their IP address to be configured the same. At initialization time, each Store daemon instance will determine its IP address using the CIDR specification. However, be aware that sources will need to use the full IP address.

  1. Identify the Store with a range of IP addresses specified in the Store configuration file. For example:

    <store name="newyork-1" port="14567" interface="10.29.3.0/24">`

    When the Store Process initializes, UM will choose a network interface within that IP address range (10.29.3.0 - 10.29.3.255).

  2. Configure the source with the IP:port using the LBM configuration option ume_store (source) so sources can find and register with the Store. You must specify the full IP address, not the CIDR range.

    source ume_store 10.29.3.16:14567
    

  3. Alternatively, you can use the Named Stores feature so that the source doesn't need to specify the full IP.