Guide for Persistence
Store Thread Affinity

A significant performance improvement of the Store can be obtained by "pinning" threads to CPU cores. Normally, the operating system will migrate a process's threads to different CPU cores, depending on what else is going on in the host. This can degrade the process's performance in a number of ways, mostly related to memory access (cache, NUMA zones). By setting the CPU affinity for the performance-sensitive threads, you avoid this degradation.

For high-throughput applications, you will gain significant performance improvement by constraining the operating system to run the Store's threads on specific CPU cores. All of a Store's threads should run on cores in the same physical CPU chip.

For maximum benefit, you should "isolate" the cores running the message reception threads. This prevents the operating system from scheduling other processes/threads on those cores.

Setting Affinity

When the Store Process is executed, the user can optionally use the "-a" option to set CPU affinity to the various threads. See Umestored Man Page.

Note that for the Windows Service, you don't supply the option when the service is run. Instead you save the thread affinity into the Windows registry for subsequent use by the Store Windows Service. See Umestoreds Man Page and Configure the Windows Service.

The "-a" option takes a comma-separate list of CPU (core) numbers. For example, "-a 1,3,1,..." refers to CPU 1, CPU 3, CPU 1 again, etc.

The sequence of numbers are assigned to threads as follows:

The first number is the "process" CPU number, which is used for all miscellaneous threads that aren't otherwise assigned.

The next 4 numbers are assigned to a Store's operational threads in the following sequence:

  1. Message reception thread.
  2. Proxy source thread.
  3. Receiver recovery thread.
  4. Auxiliary thread.

If the Store Process has multiple Stores configured, additional groups of 4 numbers should be supplied.

Of these threads, the most critical is the message reception thread. For best performance, each Store's message reception thread should be given exclusive access to its own CPU core.

The receiver recovery thread is also important, since it can affect the speed at which receivers can recover missed messages. However, since CPU cores are scarce resources on hosts, it may not be practical to give each receiver recovery thread its own core.

The proxy source and auxiliary threads are not critical to general Store throughput, and are therefore generally assigned to the "process" core as miscellaneous.

Affinity Example

For example, suppose you have a Store Process configured for two Stores. Further, let's say that on your host, even-numbered CPUs belong to one physical CPU chip, and odd-numbered CPUs belong to a different physical CPU chip. The following would optimize both message reception and message recovery, at the expense of consuming 5 cores:

umestored -a 3,5,3,7,3,9,3,11,3 ...

This assigns:

  • the process's miscellaneous threads to CPU 3,
  • the first Store's message reception thread to CPU 5,
  • the first Store's proxy source thread to CPU 3,
  • the first Store's receiver recovery thread to CPU 7,
  • the first Store's auxiliary thread to CPU 3,
  • the second Store's message reception thread to CPU 9,
  • the second Store's proxy source thread to CPU 3,
  • the second Store's receiver recovery thread to CPU 11,
  • the second Store's auxiliary thread to CPU 3.

If assigning this many cores to the Store Process is not practical, the following conserves cores at the expense of potentially degrading message recovery speed:

umestored -a 3,5,3,3,3,7,3,3,3 ...

This assigns a CPU core to each of the two message reception threads (5 and 7), and groups all other threads onto the miscellaneous CPU core (3).