Guide for Persistence
Enabling Persistence

The following table lists all source files used in this section. The files can be found in the /doc/example directory. You can also access these file via the Sample Source Code tab in the left panel, under C Example Source Code.

Filename Content
ume-example-src.c Source Application
ume-example-rcv.c Receiver Application
ume-example-src-2.c Source Application 2
ume-example-rcv-2.c Receiver Application 2
ume-example-src-3.c Source Application 3
ume-example-rcv-3.c Receiver Application 3
ume-example-config.xml Persistent Store Configuration File


Starting Configuration  <-

We begin with the minimal source and receiver used by the QuickStart Guide. To more easily demonstrate the persistence features we are interested in, we have modified the QuickStart source and receiver in the following ways.

  • Modified the source to send 20 messages with a one second pause between each message.
  • Modified the receiver to anticipate 20 messages instead of just one.
  • Assigned the topic, "UME Example", to both the source and receiver.
  • Modified the receiver to not exit on unexpected receiver events.

The last change allows us to better demonstrate basic operation and evolve our receiver slowly without having to anticipate all the options that UM provides up front.

Example files for our exercise are:

Filename Content
ume-example-src.c Source Application
ume-example-rcv.c Receiver Application


Adding the Store to a Source  <-

The fundamental component of a persistence solution is the Persistent Store. To use a Store, a source needs to be configured to use one by setting ume_store (source) for the source. We can do that with the following piece of code.

err = lbm_src_topic_attr_str_setopt(&attr, "ume_store", "127.0.0.1:14567");

This sets the Persistent Store for the source to the Store running at 127.0.0.1 on port 14567.

Example files for our exercise are:

Filename Content
ume-example-src-2.c Source Application 2
ume-example-rcv-2.c Receiver Application 2
ume-example-config.xml Persistent Store Configuration File

After adding the Store specification to the source, perform the following steps (assumes a Unix command prompt):

  1. Create the cache and state directories.
    $ mkdir umestored-cache ; mkdir umestored-state
    
  2. Start up the Store.
    $ umestored ume-example-config.xml
    
  3. Start the Receiver.
    $ ume-example-rcv
    
  4. Start the Source.
    $ ume-example-src
    

You should see a message on the source that says:

INFO: Source "UME Example" Late Join not set, but UME store specified. Setting Late Join.

This is an informational message from UM and merely means Late Join was not set and that UM is going to set it.

Notice that the receiver was not configured with any Store information. That is because setting it on the source is all that is needed. The receiver learns Store settings from the source through the normal UM topic resolution process. Receivers don't need to do anything special to leverage the usage of a Store by a source.


Adding Fault Recovery with Registration IDs  <-

If the source or receiver crashes, how does the source and receiver tell the Store that they have restarted and wish to resume where they left off? We need to add in some sort of identifiers to the source and receiver so that the Store knows which sources and receivers they are.

In persistence, these identifiers are called Registration IDs or RegIDs. UM allows the application to control the use of RegIDs as it wishes. This allows applications to migrate sources and receivers not just between systems, but between locations with true, unprecedented freedom. However, UM requires an application to be careful of how it uses RegIDs. Specifically, an application must not use the same RegID for multiple sources and/or receivers at the same time.

Now let's look at how we can use RegIDs to provide complete fault recovery of sources and receivers. We'll first handle RegIDs in the simplest manner by using static IDs for our source and receiver. For the source, the RegID of 1000 can be added to the existing Store specification by changing the string to 127.0.0.1:14567:1000

This yields the source code in ume-example-src-2.c

For the receiver, we accomplish this in two steps.

  1. Set a callback function to be called when we desire to set the RegID to 1100. This is done by declaring a callback function which will return the RegID value 1100 to UM. The example names the callback app_rcv_regid_callback().

  2. Inform the LBM configuration for the receiver to use this callback function. That is accomplished by setting the ume_registration_extended_function() similar to example code below.
lbm_ume_rcv_regid_ex_func_t id; /* structure to hold registration function information */
id.func = app_rcv_regid_callback; /* the callback function to call */
id.clientd = NULL; /* the value to pass in the clientd to the function */
err = lbm_rcv_topic_attr_setopt(&attr, "ume_registration_extended_function", &id, sizeof(id));

Once this is done, the receiver has the ability to control what RegID it will use. This yields the source code in ume-example-rcv-2.c.

With these in place, you can experiment with killing the receiver and bringing it back (as long as you bring it back before the source is finished), as well as killing the source and bringing it back.

The restriction to this initial approach to RegIDs is that the RegIDs 1000 and 1100 may not be used by any other objects at the same time. If you run additional sources or receivers, they must be assigned new RegIDs, not 1000 or 1100. Let's now take a more sophisticated approach to RegIDs that will allow much more flexibility.


Enabling Persistence Between the Source and Store  <-

Let's refine our source to include some desired behavior following a crash. Upon restart, we want our source to resume with the first unsent message. For example, if the source sent 10 messages and crashed, we want our source to resume with the 11th message and continue until it has sent the 20th message.

Accomplishing this graceful resumption requires us to ensure that our source is the only source that uses the RegID assigned to it. The same RegID should be used as long as the source has not sent the 20th message regardless of any crashes that may occur. The sources and receivers are primarily responsible for managing the RegIDs.

The following two sections explain the changes needed for the source and receiver, which become fairly easy due to the events that UM delivers to the application during persistence operation.


Enabling Persistence in the Source  <-

With the above mentioned behaviors in mind, let's turn to looking at how they may be implemented with persistence, starting with the source. We can summarize the changes we need by the following list.

  1. At source startup, use any saved RegID information found in the file by setting information in the ume_store (source) configuration variable.
  2. After the Store registration is successful, if a new RegID was assigned to the source, save the RegID to the file.
  3. Set the message number to begin sending. Refer to the explanation below.
  4. Send until message number 20 has been sent.
  5. After message 20 has been sent, delete the saved RegID file.

For Step 3, if the source has just been initialized, the application starts with message number 1. If the source has been restarted after a crash, the application looks to UM to establish the beginning message number because UM will use the next sequence number. For this simple example, we can make the assumption that each message is one sequence number for UM and that UM starts with sequence number 0. Thus the application can set the message number it begins resending with the value of the UM sequence number + 1. These changes yield the source code in ume-example-src-3.c


Smart Sources and Persistence  <-

When using the Smart Sources feature to send persistent messages, there are a few restrictions:


Enabling Persistence in the Receiver  <-

Let's also refine the receiver to resume where it left off after a crash. Just as with the source, the receiver can have the Store assign it a RegID if the receiver is just beginning. Once the receiver receives the 20th message from the source, it can get rid of the RegID and exit. Because the receiver can receive some messages, crash, and come back, we should only need to look at a message and check if it is the 20th message based on the message contents or sequence number. UM provides all the events to the application that we need to create these behaviors in the receiver.

The receiver changes are summarized below:

  1. At receiver startup, use any saved RegID information found in the file for callback information when needed.

  2. When RegID callback is called: Check to see if the source RegID matches the saved source RegID. If it does, return the saved receiver RegID. RegID matches the saved source RegID if so, return the saved receiver RegID.

  3. After Store registration is successful: If not using a previously saved RegID, then save the RegID assigned by the Store to the source to a file, as well as the Store information and the source RegID.

  4. After the last message is received (message number 20 or UM sequence number 19), end the application and delete the saved RegID file.

RegIDs in UM can be considered to be per source and per topic. Thus the receiver does not want to use the wrong RegID for a different source on the same topic. To avoid this, we save the source RegID and even Store information so that the app_rcv_regid_callback() can make sure to use the correct RegID for the given source RegID. These changes yield the source code in ume-example-rcv-3.c

The above sources and receivers are simplified for illustration purposes and do have some limitations. The receiver will only keep the information for one source at a time saved to the file. This is fine for illustration purposes, but would be lacking in completeness for production applications unless it was assured that a single source for any topic would be in use. To extend the receiver to include several sources is simply a matter of saving each to the file, reading them in at startup, and being able to search for the correct one for each callback invoked.