Configuring High Availability for DataServer Deliverables, Version 5.2

High Availability (HA) mitigates single point of failure within the Monitor by providing a means of defining redundant system components, together with failover capability, for users of those components.

When using HA, components are designated PRIMARY and BACKUP. If the PRIMARY component fails, failover occurs to the BACKUP component. If the PRIMARY component is subsequently restarted, the BACKUP component allows the newly restarted component to take the primary role and return to its backup role.

Overview of High Availability Architecture

Data Server High Availability

The primary and backup data servers connect to each other via socket. If the primary data server stops, then the backup server takes over. If the primary then comes back online, then the primary takes over again and the backup returns to standby mode. The data client connections will move between the two servers accordingly.

NOTE: Be aware that data clients can connect to the standby server using a non-fault tolerant URL and still get data because of a proxy feature where the standby server forwards data requests to the primary server. This can be confusing when you use the HTML Cache Viewer (http://localhost:3270/common) on the standby server to view cache contents because it looks like the standby server caches are updating, but you are really viewing the data in the primary server and not in the standby server.

Historian High Availability

The primary and backup historian connect to each other via socket. If the primary historian stops, then the backup takes over. If the primary historian comes back online, then the primary takes over again and the backup returns to standby mode. Only the active historian writes to the database. The historian is a data client of the data server and connects to it via a fault tolerant URL (socket only), which means that the data servers and historians can fail over separately or together.

Requirements for Configuring High Availability

The following are minimum requirements for High Availability:

  • Two host machines, one for the primary host and one for the backup host.
  • Both hosts must be configured such that the RTView processes on each host can connect to each other via socket.
  • Both hosts must be able to access:
    • the same data connections
    • the same historian database
    • the alert threshold database
  • The RTView processes on both hosts must be able to run against identical properties files. In the case where drivers or other third party jars are located in different directories on the two hosts, create a directory in the same location in each host, copy the jar files into and reference that directory in your properties.
  • Tomcat or other Application Server
    • The HTML UI and rtv servlets must be deployed on an application server other than the internal Jetty server. Note that this requires extra configuration of the servlet .war files in the application server. The application server must be able access both the Primary Host and Backup Host. Refer to your application server documentation if you need high availability access to your application server.

Steps for Configuring High Availability

To Configure High Availability:

  • On both the primary and backup hosts, define the following environment variables:

PRIMARYHOST - the IP Address or hostname of the host running the primary servers (for example, set PRIMARYHOST=MyHost).
BACKUPHOST - the IP Address or hostname of the host running the backup servers (for example, set BACKUPHOST=OtherHost).

  • Install the Monitor on both the primary host and the backup host.
  • Configure your servlets to be HA and deploy them to your application server:

cd RTViewDataServer<SP>\projects\rtview-server

  • In a text editor, open update_wars(.bat or .sh) and fill in the values for HOST and HA_HOST as described in the script.
  • Run the update_wars(.sh or .bat) script.
  • Copy the generated war files to the webapps directory of your application server.
  • To run High Availability, you must run the following from the command line:

Windows

From the command line on the primary host, cd to RTViewDataServer<SP> and type start_server -haprimary.

From the command line on the backup host, cd to RTViewDataServer<SP> and type start_server -habackup.

Unix

From the command line on the primary host, cd to RTViewDataServer<SP> and type start_server.sh -haprimary.

From the command line on the backup host, cd to RTViewDataServer<SP> and type start_server.sh -habackup.

  • Configure the Monitor on the primary host using the RTView Configuration Application. Make sure to configure data collection, configure server options and databases, and enable alert persistence.

Note that the RTView Configuration Application must be able to connect both the primary and backup servers in order to enable editing. The same properties are saved to both servers. The RESTART SERVERS button (in the RTView Configuration Application) restarts both the primary and backup servers at the same time. If you want to stagger the restarts, use the scripts under RTViewDataServer<SP> to stop and then start your servers after making changes in the RTView Configuration Application.

Note: Jetty does not have to be disabled, but data clients will not be able to make high availability connections to the data server using the Jetty URL. However, the Jetty URL can still be used to configure the application.

Verifying the High Availability Configuration

Verify failover and failback configurations by looking for the following in the log files.

Note: If the PRIMARYHOST and/or BACKUPHOST environment variable(s) is/are not set, you will get the following error in the log files and HA will be disabled:

ERROR: Disabling HA because the PRIMARYHOST and/or BACKUPHOST environment variable is not set.

Primary Data Server Log File

startup
[rtview] Starting as primary HA data server accessible via //primaryhostname:3278,//backuphostname:3278
[rtview] DataServerHA: connected to backuphostname:3278
[rtview] DataServerHA: run as primary server, backuphostname:3278 has lower priority than this server
[rtview] leaving standby mode

Backup Data Server Log File

startup
[rtview] Starting as backup HA data server accessible via //primaryhostname:3278,//backuphostname:3278
rtview] entering standby mode
after failover (primary data server exits)
[rtview] DataServerHA: error receiving message: java.net.SocketException: Connection reset (primaryhostname:3278)
[rtview] DataServerHA: becoming primary server, lost connection to primary server primaryhostname:3278
[rtview] leaving standby mode
after failback (primary data server comes back up)
[rtview] DataServerHA: resigning as primary server, got standby directive from other server primaryhostname:3278
[rtview] connected to primaryhostname:3278
[rtview] entering standby mode

Primary Historian Log File

[rtview] Starting as primary HA historian paired with backup historian at <backuphostname>:3222
[rtview] ServerGroup: status of member <backuphostname>:3222: primary, priority= 1, started=Wed Nov 14 12:56:01 PST 2018
[rtview] ServerGroup: primary server = local
[rtview] ServerGroup: becoming primary server

Backup Historian Log File

[rtview] Starting as backaup HA histoiran paired with primary historian at <primaryhostname>:3222
[rtview] ServerGroup: status of member <primaryhostname>:3222: primary, priority= , started=Wed Nov 14 12:56:01 PST 2018
[rtview] ServerGroup: primary server = <primaryhostname>:3222
after failover (primary historian exits):
[rtview] error receiving message: java.io.EOFException (primaryhostname:3222)
[rtview] ServerGroup: disconnected from primaryhostname:3222
[rtview] ServerGroup: primary server = local
after failback (primary historian starts back up):
[rtview] ServerGroup: status of member primaryhostname:3222: primary, priority= 2, started= Tue Nov 20 09:12:43 PST 2018
[rtview] ServerGroup: connected to primaryhostname:3222
[rtview] ServerGroup: primary server = primaryhostname:3222