Operations Guide
|
In a multicast environment, only the applications and monitoring tools need to be started. If using Persistence, the Store daemon (umestored) also needs to be started. Likewise, use of the DRO requires starting the DRO daemon (tnwgd).
In a unicast-only environment, one or more resolver daemons (lbmrd) are typically required. It is recommended that you start the lbmrd before starting the applications.
Informatica recommends that you shutdown applications using UM sources and receivers cleanly, even though UM is able to cope with the ungraceful shutdown and restart of applications and UM daemons.
A failed assertion could lead to immediate application shutdown. If opting to restart a UM client or lbmrd, no other components need be restarted. Failed assertions should be logged with Informatica support.
Topic Resolution (TR) is the process by which subscribers discover a topic's publisher's transport session information. The subscriber uses this information to join the source's transport session, which then delivers published messages to the subscriber. With Multicast UDP TR, topic resolution does not need to be started or shutdown. But TR does require network resources, and if not configured properly can lead to temporary or permanent deafness in receivers, as well as other problems. See Topic Resolution Overview for more detailed information.
Applications cannot deliver messages until topic resolution completes. UM monitoring statistics are active before all topics resolve. In a large topic space (e.g. 10,000 topics) topic resolution messages may be 'staggered' or rate controlled, taking potentially several seconds to complete.
For example, 10,000 topics at the default value of 1,000 for resolver_initial_advertisements_per_second (context) will take 10 seconds to send out an advertisement for every topic. If all receiving applications have been started first, fully resolving all topics may not take much more than 10 seconds. The rate of topic resolution can also be controlled with the resolver_initial_advertisement_bps (context) configuration option. Topic advertisements contain the topic string and approximately 110 bytes overhead. Topic queries from receivers contain no overhead, only the topic string.
Your UM development or administration team should anticipate the time and bandwidth required to resolve all topics when all applications initially start. This team should also establish any restarting restrictions.
Operations staff typically don't have any operational tasks related to topic resolution aside from monitoring hosts' CPU and bandwidth usage.
Your UM development team should provide you with the application names, resident machines and startup parameters, along with a sequence of application/daemon startups and shutdowns.
The following lists typical application startup errors.
Cannot bind port - lbm_context_create: could not find open TCP server port in range.
Too many applications may be running using the UM context's configured port range on this machine. This possibility should be escalated to your UM development team.
Application is possibly already running. It is possible to start more than one instance of the same UM application.
This message appears for multi-homed machines. UM is not explicitly configured to use a single interface. This may not cause an issue but requires configuration review by your UM development team.
A UM application shutdown may not be obvious immediately, especially if you are monitoring scores of applications. The following lists events that may indicate an application has shutdown.
If not using multicast topic resolution, one or more instances of lbmrd must be started prior to stating applications. Unicast resolver daemons require an XML configuration file and multiple resolver daemons can be specified by your UM development team for resiliency. For more information on Unicast Topic Resolution, see Unicast UDP TR.
Execute the following command on the appropriate machine to start a unicast topic resolver (lbmrd) from command line:
For more information on the lbmrd command-line, see Lbmrd Man Page.
To stop an lbmrd that is running as a Windows service, use the Windows service control panel to stop it. Otherwise, kill the PID. If an lbmrd terminates, you need to restart it.
Observe the lbmrd logfile for errors and warnings
To make the lbmrd a Windows Service, see UM Daemons as Windows Services.
If running multiple lbmrds and an lbmrd in the list becomes inactive, the following message appears in the clients' log files:
If all unicast resolver daemons become inactive, the following message appears in the clients' log files:
After all topics are resolved, daemons do not strictly need to be running unless you restart applications. Resolver daemons do not cache or persist state and do not require other shutdown maintenance.
Stores can operate in disk-backed or memory-only mode specified in the Store's XML configuration file. Disk backed Stores are subject to the limitations of the disk hardware. For high performance applications, Informatica recommends solid-state disks local to the physical host running the Store. Those solid-state disks should be optimized for writing and should be dedicated to UM Store usage only (don't share it with other databases or disk-intensive programs).
There are a few scenarios where you need to start a Store:
For details on starting a Store, see Starting a Store.
Similarly, there are a few scenarios where you need to shut down a Store:
For details on shutting down a Store, see Shutting Down a Store.
When you shut down a Store with the intention of restarting it, or if a Store fails, you should wait to restart for more than the time specified by the ume_store_activity_timeout (source) (defaults to 10 seconds). I.e. you should wait for the publishers to time out the old Store instance before you bring up the new instance.
Sometimes when you start a Store, you want to eliminate past state. That is, you don't want any subscribers to recover older messages. For example, if bringing a whole system up from scratch. Or many users do a clean start after every maintenance window.
In this case, you must cleanly restart all publishers, subscribers, and Stores. It is recommended to have every component in the down state simultaneously before restarting any of them. Otherwise you risk having a newly restarted component learning old state from a not-yet-restarted component.
Part of cleanly restarting all Stores is deleting their state and cache files. If you fail to delete the state and cache files from one or more Stores, subscribers might start up and recover (replay) old messages that it had already processed. Or subscribers can be temporarily deaf to some set (potentially large) set of newly published messages. Please ensure proper deletion of state and cache files prior to restarting Stores when you intend to eliminate past state.
Sometimes when you start a Store, you want to retain past state. That is, you want to restart Stores in such a way that applications can pick up where they left off, and recover lost messages if necessary. For example, if restarting a store after a problem.
You should restart the Store with the state and cache files intact. When the Store restarts, it will read and restore the previous state. It will then attempt to recover as many missed messages from the sources, as possible. Then the Store resumes normal operation.
Note that if the cache files are very large, it might take a significant amount of time for the Store to initialize. If this time is objectionable, you may choose a Limit Initial Restore with Restore-Last.
There are times where it is not possible to retain the past state. For example, a failure of the disk holding the state and cache files. Or a failure of the Store's host, and the Store is restarted on a new host. In that case, care must be taken to ensure that the applications will properly restore as much of their previous state as possible.
For example, let's say that two Stores out of a three-Store QC group fail and lose their state and cache files. You have no choice but to restart those Stores cleanly, without state and cache. Note that in this example, since only one Store is running (the "stateful" Store), the applications lose quorum and pause their execution until Quorum is restored.
In this dual-failure case, it is important *not* to restart both failed Stores at the same time. To do so could result in applications registering with the two cleaned Stores first, and resetting their own internal states. This could lead to subscriber deafness, or replay of previously processed messages.
Instead, you must restart one of the cleaned Stores and allow the applications to register with the clean Store and the stateful Store. Allow some time for the the cleaned Store to collect some state from the applications (typically requires several seconds). Then the other cleaned Store can be restarted.
Unix users can start a Store with following command from a shell prompt or script:
See Umestored Man Page for details on the umestored command-line.
Windows users typically install the Store as a Windows service (see UM Daemons as Windows Services). Windows has a control panel for starting services.
Memory-only Stores do not save state in disk files; they always start up without past state.
Disk-based Stores create two types of state files:
A Store signals it has completed initialization by logging a message of the form:
Store-5688-5546: Store "StoreName" ready to accept registrations
If a Store process is configured to have more than one Store instance, this message will be logged for each configured Store instance (see Store Processes and Instances).
The time required for a Store to complete initialization depends on the size of the cache file. For very large cache files, the initialization time can become objectionably long. see Limit Initial Restore with Restore-Last for a solution.
If the Store is running as a Windows service, use the Service Manager to stop the Windows service.
Otherwise kill the PID. You can find the PID in the configured PID file; see UMP Element "<pidfile>".
When a DRO starts it discovers all sources and receivers in the topic resolution domains to which it connects. This results in a measurable increase and overall volume of topic resolution traffic and can take some time to complete depending upon the number of sources, receivers, and topics. The rate limits set on topic resolution also affect the time to resolve all topics.
See also Topic Resolution.
Execute the following command on the appropriate machine to start a DRO (tnwgd) from command line:
Informatica recommends:
For more information on the tnwgd command-line, see Tnwgd Man Page. To make the DRO a Windows Service, see UM Daemons as Windows Services.
Perform the following procedure to restart a DRO.
On the Microsoft Windows platform, the UM daemons can be used either from the command line or as Windows Services.
The UM daemons available as Windows Services are:
Executable File | Description | Service Display Name | Man Page |
---|---|---|---|
lbmrds.exe | UDP-based Unicast Topic Resolver | "LBMR Store Daemon" | man page |
srsds.exe | TCP-based Topic Resolver | "UM Stateful Topic Resolution Service" | man page |
mcsds.exe | Monitoring Collector Service | "UM Monitoring Collector Service" | man page |
storeds.exe | Persistent Store | "UME Store Daemon" | man page |
tnwgds.exe | Dynamic Router (DRO) | "Ultra Messaging Gateway" | man page |
Note that the Ultra Messaging Manager daemon ("ummd"
) is not offered as a Windows Service at this time.
Also note that the UM daemons were not designed to run multiple instances of the service on the same host. See known limitation 9337.
As of UM version 6.12 and beyond, the above UM daemons work similarly with respect to running as a Windows Service. See the individual man pages for differences.
For each service, the executable file (e.g. "storeds.exe") is used for two purposes:
That second purpose, configuring the UM Windows Service, consists of running the executable with one or more command-line options to store desired operational parameters into the Windows registry. This makes those parameters available to the service when Windows starts the service.
First, make sure that your UM license key is provided in a way that the service can access it. In particular, if you are using an environment variable to set the license key, it must be a system environment variable, not user.
Once your license key is ready, there are 4 overall steps to running a UM daemon as a Windows Service:
All 4 steps must be completed before the Service can be used.
There are two ways to install a UM daemon as a Windows Service:
Product package installer
When installing the product using the package installer, the dialog box titled "Choose Components" provides one or more check boxes for UM daemons to be installed as services. You may check any number of the boxes and proceed with the installation.
Note that for any box not checked, the software for that daemon is still copied onto the machine. This allows for installation as a Windows Service at a later time using the Command Line method.
Command line
If a daemon was not installed as a Windows Service from the product's package installer (possibly because the package installer was not used), daemons can be installed at a later time from the command line.
Open a Windows Command Prompt, enabling Administrator access. (One way to do this is to right-click on the Command Prompt icon and select "More > Run as administrator".)
UM daemons are configured via XML configuration files. These files must be created and managed by the user. Each individual daemon needs its own separate XML configuration file.
Informatica recommends developing and testing the daemon configuration files interactively, using the command-line interface of each daemon. Do not run the daemon as a Windows Service until the daemon configuration has been validated and tested. This provides the fastest test cycle while the configuration is being developed and finalized.
The configuration files should be located on the hosts that are intended to run the daemons in files/folders of the user's choosing.
For more information on configuring and running the daemons interactively, see:
Executable File | Description | Configuration Details | Man Page |
---|---|---|---|
lbmrds.exe | UDP-based Unicast Topic Resolver | lbmrd Configuration File | man page |
srsds.exe | TCP-based Topic Resolver | SRS Configuration File | man page |
mcsds.exe | Monitoring Collector Service | MCS Configuration File | man page |
storeds.exe | Persistent Store | Configuration Reference for Umestored | man page |
tnwgds.exe | Dynamic Router (DRO) | DRO Configuration Reference | man page |
"Configure the Windows Service" is different from Configure the Daemon. Configuring a UM Service provides Windows-specific operational parameters to the UM Daemon, which are not configurable via the Daemon Configuration. For example, you need to tell the Service where to find the Daemon Configuration file.
At this point, you should have the Daemon XML configuration file(s) prepared and available on the host which is to run the desired daemon(s) (See Configure the Daemon). And you should have tested the configuration using the daemon interactively to verify is correct operation.
Configuring the Service consists of running the Service executable from a Windows Command Prompt with one or more command-line options to store desired operational parameters into the Windows registry. This makes those parameters available to the service when Windows starts the service.
There are several operational parameters that are common across all of the UM Windows Services:
Each Service has additional parameters that are specific to that Service; see each Service's man page:
Executable File | Description | Configuration Details | Man Page |
---|---|---|---|
lbmrds.exe | UDP-based Unicast Topic Resolver | lbmrd Configuration File | man page |
srsds.exe | TCP-based Topic Resolver | SRS Configuration File | man page |
mcsds.exe | TCP-based Topic Resolver | MCS Configuration File | man page |
storeds.exe | Persistent Store | Configuration Reference for Umestored | man page |
tnwgds.exe | Dynamic Router (DRO) | DRO Configuration Reference | man page |
Daemon Configuration File
The UM daemons require a configuration file. You must configure the Windows Service with the path to the daemon's configuration file. This is done with the "-s config"
command-line option. For example:
This saves the file path into the Windows registry. Subsequently, each time the UME Store Windows Service starts, it will read that file to configure the Store daemon. (Note: lbmrds uses upper-case "-S".)
Windows Event Logger
The UM daemons write their log messages to their log files. The UM Windows Services have the option of also writing the log messages to the Windows Event Logger.
UM Log messages are categorized into different severity levels: "info", "notice", "warning", "err", "alert", "emerg". By default, the UM Windows Services will write log messages of category "warning" and above to the Windows Event Log.
If desired, the category can be configured with the "-e"
command-line option. For example:
This saves the "notice" category into the Windows registry. Subsequently, each time the UME Store Windows Service starts, it will log message of "notice" and above to the Windows Event Log.
Be aware that setting the severity level below "warning" can result in very many messages being written to the Windows Event Log. Also be aware that messages of all severity levels are written to the daemon's log file, independent of the "-e"
setting.
Environment Variables File
The UM daemons occasionally can have useful features enabled through the use of environment variables. Most of the UM Windows Services allow the use of an optional disk file containing environment variable assignments. Each time the Service starts, that file is read and the environment variables are set for that daemon process. The SRS Windows service does not currently support this.
If desired, the Environment Variable file path can be configured with the "-E"
command-line option. For example:
This saves the file path into the Windows registry. Subsequently, each time the UME Store Windows Service starts, it will read that file and set its environment variables.
The format of that file is shown by this example:
# Environment Variable File for UME Store LBM_DEBUG_MASK="0xC384" LBM_DEBUG_FILENAME="c:\temp\store_debug.txt"
The quote marks are required.
If you want to stop using an environment variable file, you must remove its entry from the Windows registry with the "-U"
command-line option. For example:
This removes the environment variable file path from the Windows registry. The next time the UME Store Windows Service starts, it will not set its environment. Note that this operation does not remove the environment file.
Windows Services are controlled by the "Services" control panel. See your Windows documentation for information on controlling Windows Services.
There are two ways to remove the UM daemons as Windows Services:
Windows Uninstaller
If the UM package was installed by the Package Installer, using the normal Windows "Add or Remove Programs" control panel removes the Windows Service, as well as also removing the installed files.
Manual Service Removal
You can remove a UM Windows Service manually using the "-s remove"
command-line option. For example:
This removes the UM Store as a Windows Service. (Note: lbmrds uses upper-case "-S".)
Note that this does not uninstall any of the UM software. It only removes the the daemon as a Windows Service.
Tools available to analyze UM activity and performance.
For more information about Wireshark please visit https://www.wireshark.org/. (The UM plugins are part of the current release.)
pstack dumps a stack trace for a process (pid). If the process named is part of a thread group, then
pstack traces all the threads in the group.
The use of UM debug flags requires the assistance of UM Support. Also refer to the following Knowledge Base articles for more information about using debug flags.