8.9. Web server cluster configuration

Clustering is a common technique for improving system scalability and availability. Clustering refers to setting up multiple web servers such as Tomcat instances and have them serve a single application. Clustering allows for scaling out an application in the sense that new servers can be added to improve performance. It also allows for high availability as the system can tolerate instances going down without making the system inaccessible to users.

When setting up multiple Tomcat instances there is a need for making the instances aware of each other. Each DHIS 2 instance will keep a local data cache. When an update is done on one instance, the caches on the other instances must be notified so that they can be invalidated and avoid becoming stale.

8.9.1. Cluster configuration

A DHIS 2 cluster setup is based on manual configuration of each instance. For each DHIS 2 instance one must specify the public hostname as well as the hostnames of the other instances participating in the cluster. You can optionally specify the port numbers for which each instance should listen for cache updates.

The hostname of the server is specified using the cluster.instance0.hostname configuration property. Additional servers which participate in the cluster are specified using properties on the format cluster.instanceN.hostname, where N refers to the cluster instance number. You can specify up to 4 cluster instances in a configuration file, giving a maximum cluster size of 5 instances. N is a number between 1 and 4.

The hostname must be visible to the participating servers on the network for the clustering to work. You might have to allow incoming and outgoing connections on the configured port numbers in the firewall.

The port number of the server is specified using the cluster.instance0.cache.port configuration property. The remote object port used for registry receive calls is specified using cluster.instance0.cache.remote.object.port. Specifying the port numbers is typically useful when you have multiple cluster instances on the same server / virtual machine or if you need to explicitly specify the ports to be used so as to have them configured in firewall. When running cluster instances on separate servers / virtual machines it is often appropriate to use the default port numbers and omit the ports configuration properties. If omitted, 4001 will be assigned as the listener port and a random free port will be assigned as the remote object port.

An example setup for a cluster of two web servers is described below. For server A availabe at hostname 193.157.199.131 the following can be specified in dhis.conf:

# Cluster configuration for server A

# Hostname for this web server
cluster.instance0.hostname = 193.157.199.131

# Ports for cache listener, can be omitted
cluster.instance0.cache.port = 4001
cluster.instance0.cache.remote.object.port = 5001

# Hostname for web server B participating in cluster
cluster.instance1.hostname = 193.157.199.132

# Port for cache listener on web server B, can be omitted
cluster.instance1.cache.port = 4001

For server B available at hostname 193.157.199.132 the following can be specified in dhis.conf (notice how ports configuration is omitted):

# Cluster configuration for server B

# Hostname for this web server
cluster.instance0.hostname = 193.157.199.132

# Hostname for web server A participating in cluster
cluster.instance1.hostname = 193.157.199.131

You must restart each Tomcat instance to make the changes take effect. The two instances have now been made aware of each other and DHIS 2 will ensure that their caches are kept in sync.

8.9.2. Load balancing

With a cluster of Tomcat instances set up, a common approach for routing incoming web requests to the backend instances participating in the cluster is using a load balancer. A load balancer will make sure that load is distributed evenly across the cluster instances. It will also detect whether an instance becomes unavailable, and if so, stop routine requests to that instance and instead use other available instances.

Load balancing can be achieved in multiple ways. A simple approach is using nginx, in which case you will define an upstream element which enumerates the location of the backend instances and later use that element in the proxy location block.

http {

  # Upstream element with sticky sessions

  upstream dhis_cluster {
    ip_hash;
    server 193.157.199.131:8080;
    server 193.157.199.132:8080;
  }

  # Proxy pass to backend servers in cluster

  server {
    listen 80;

    location / {
      proxy_pass   http://dhis_cluster/;
    }
  }
}  

DHIS 2 keeps server-side state for user sessions to a limited degree. Using "sticky sessions" is a simple approach to avoid replicating the server session state by routing requests from the same client to the same server. The ip_hash directive in the upstream element ensures this.

Note that several instructions have been omitted for brevity in the above example. Consult the reverse proxy section for a detailed configuration guide.