Parallel processes to handle simultaneous requests

Cluster

Preliminaries

This note describes the server's ability to handle concurrent requests through a single address and port.

Configuration

The RWSERVE software is able to handle multiple simultaneous requests using its built-in cluster capability. The number of concurrent processes to load into memory is declared in the cluster-size entry. The maximum number of concurrent processes is 64.

Round-robin scheduling

When a browser issues an HTTP/2 request to any of the hostnames declared in the configuration file, the incoming connection request is assigned to one of the cluster processes, and that process handles the entire request/response cycle. When subsequent requests are made via the same browser within a short period of time, the connection will remain open and the same process will handle it.

When another browser issues a request, its incoming connection is assigned to a different cluster process, using a simple round-robin approach. For example, with a cluster size of four, a series of incoming requests coming from different browsers would be assigned to cluster process 0, 1, 2, 3, 0, 1, 2, 3, ...

Be aware that this simple approach to scheduling is indifferent to the types of requests being made, or their potential processing needs, or their possible payload sizes. Because of this it is possible to experience good throughput on some processes, while experiencing undesirable throughput on others, due to weak round-robin scheduling.

For example, if a series of eight consecutive requests arriving 50ms apart exhibit this processing profile [250ms, 250ms, 2000ms, 250ms, 250ms, 250ms, 250ms, 250ms], and the process queues were initially all empty, then the apparent response time as experienced by each browser would be [250ms, 250ms, 2000ms, 250ms, 300ms, 300ms, 2050ms, 300ms]. The first four requests are served without delay, but the next four are each subjected to process queueing that increases their overall response time. The seventh request in particular, has the worst response time, even though the effort spent in actual processing is the same as its neighbors.

process start offset queue time end offset response time
#0 0 0 ms 250 250 ms
#1 50 0 ms 300 250 ms
#2 100 0 ms 2100 2000 ms
#3 150 0 ms 400 250 ms
#0 200 50 ms 500 300 ms
#1 250 50 ms 550 300 ms
#2 300 1800 ms 2350 2050 ms
#3 350 50 ms 650 300 ms

The solution to this problem is to increase the number of processes in the cluster in order to reduce, or entirely remove, queuing times.

Cluster size considerations

Determining the optimal size for the server's cluster should be done through empirical testing against actual loads.

All things being equal, if you have a choice between provisioning a server with more CPU cores or more memory, choose the server with more CPUs.

EBNF

SP ::= U+20
CR ::= U+0D
SOLIDUS ::= U+2F
ASTERISK ::= U+2A
FULL-STOP ::= U+2E
GRAVE-ACCENT ::= U+60
LEFT-CURLY-BRACKET ::= U+7B
RIGHT-CURLY-BRACKET ::= U+7D
number-of-processes ::= [1..64]
cluster-size-entry ::= 'cluster-size' number-of-processes CR
server-section ::= 'server' SP LEFT-CURLY-BRACKET CR
cluster-size
RIGHT-CURLY-BRACKET CR

Cookbook

Example 1: three CPUs, 1Gb RAM
server {
ip-address 10.20.30.40
port 443
cluster-size 3
}
Example 2: two CPUs, 2Gb RAM
server {
ip-address 10.20.30.40
port 443
cluster-size 2
}
Example 3: one CPU, 3Gb RAM
server {
ip-address 10.20.30.40
port 443
cluster-size 1
}

Review

Key points to remember:

  • Servers with larger cluster sizes exhibit fewer queue-related delays.
  • Servers with more CPUs are able to handle larger cluster sizes.

Parallel processes to handle simultaneous requests