High Availability (HA) mitigates single point of failure within the Monitor by providing a means of defining redundant system components, together with failover capability, for users of those components.

When using HA, components are designated PRIMARY and BACKUP. If the PRIMARY component fails, failover occurs to the BACKUP component. If the PRIMARY component is subsequently restarted, the BACKUP component allows the newly restarted component to take the primary role and return to its backup role.

Overview of High Availability Architecture

Data Server High Availability

The primary and backup data servers connect to each other via socket. If the primary data server stops, then the backup server takes over. If the primary then comes back online, then the primary takes over again and the backup returns to standby mode. The data client connections will move between the two servers accordingly.

NOTE: Be aware that data clients can connect to the standby server using a non-fault tolerant URL and still get data because of a proxy feature where the standby server forwards data requests to the primary server. This can be confusing when you use the HTML Cache Viewer (http://localhost:3270/common) on the standby server to view cache contents because it looks like the standby server caches are updating, but you are really viewing the data in the primary server and not in the standby server.

Historian High Availability

The primary and backup historian connect to each other via socket. If the primary historian stops, then the backup takes over. If the primary historian comes back online, then the primary takes over again and the backup returns to standby mode. Only the active historian writes to the database. The historian is a data client of the data server and connects to it via a fault tolerant URL (socket only), which means that the data servers and historians can fail over separately or together.

Requirements for Configuring High Availability

The following are minimum requirements for High Availability:

Steps for Configuring High Availability

To Configure High Availability:

PRIMARYHOST - the IP Address or hostname of the host running the primary servers (for example, set PRIMARYHOST=MyHost).
BACKUPHOST - the IP Address or hostname of the host running the backup servers (for example, set BACKUPHOST=OtherHost).

cd RTViewDataServer<SP>\projects\rtview-server

Windows

From the command line on the primary host, cd to RTViewDataServer<SP> and type start_server -haprimary.

From the command line on the backup host, cd to RTViewDataServer<SP> and type start_server -habackup.

Unix

From the command line on the primary host, cd to RTViewDataServer<SP> and type start_server.sh -haprimary.

From the command line on the backup host, cd to RTViewDataServer<SP> and type start_server.sh -habackup.

Note that the RTView Configuration Application must be able to connect both the primary and backup servers in order to enable editing. The same properties are saved to both servers. The RESTART SERVERS button (in the RTView Configuration Application) restarts both the primary and backup servers at the same time. If you want to stagger the restarts, use the scripts under RTViewDataServer<SP> to stop and then start your servers after making changes in the RTView Configuration Application.

Note: Jetty does not have to be disabled, but data clients will not be able to make high availability connections to the data server using the Jetty URL. However, the Jetty URL can still be used to configure the application.

Verifying the High Availability Configuration

Verify failover and failback configurations by looking for the following in the log files.

Note: If the PRIMARYHOST and/or BACKUPHOST environment variable(s) is/are not set, you will get the following error in the log files and HA will be disabled:

ERROR: Disabling HA because the PRIMARYHOST and/or BACKUPHOST environment variable is not set.

Primary Data Server Log File

startup
[rtview] Starting as primary HA data server accessible via //primaryhostname:3278,//backuphostname:3278
[rtview] DataServerHA: connected to backuphostname:3278
[rtview] DataServerHA: run as primary server, backuphostname:3278 has lower priority than this server
[rtview] leaving standby mode

Backup Data Server Log File

startup
[rtview] Starting as backup HA data server accessible via //primaryhostname:3278,//backuphostname:3278
rtview] entering standby mode
after failover (primary data server exits)
[rtview] DataServerHA: error receiving message: java.net.SocketException: Connection reset (primaryhostname:3278)
[rtview] DataServerHA: becoming primary server, lost connection to primary server primaryhostname:3278
[rtview] leaving standby mode
after failback (primary data server comes back up)
[rtview] DataServerHA: resigning as primary server, got standby directive from other server primaryhostname:3278
[rtview] connected to primaryhostname:3278
[rtview] entering standby mode

Primary Historian Log File

[rtview] Starting as primary HA historian paired with backup historian at <backuphostname>:3222
[rtview] ServerGroup: status of member <backuphostname>:3222: primary, priority= 1, started=Wed Nov 14 12:56:01 PST 2018
[rtview] ServerGroup: primary server = local
[rtview] ServerGroup: becoming primary server

Backup Historian Log File

[rtview] Starting as backaup HA histoiran paired with primary historian at <primaryhostname>:3222
[rtview] ServerGroup: status of member <primaryhostname>:3222: primary, priority= , started=Wed Nov 14 12:56:01 PST 2018
[rtview] ServerGroup: primary server = <primaryhostname>:3222
after failover (primary historian exits):
[rtview] error receiving message: java.io.EOFException (primaryhostname:3222)
[rtview] ServerGroup: disconnected from primaryhostname:3222
[rtview] ServerGroup: primary server = local
after failback (primary historian starts back up):
[rtview] ServerGroup: status of member primaryhostname:3222: primary, priority= 2, started= Tue Nov 20 09:12:43 PST 2018
[rtview] ServerGroup: connected to primaryhostname:3222
[rtview] ServerGroup: primary server = primaryhostname:3222