Load balancing

If your resource is serviced by several servers, you can set up several upstreams corresponding to these servers in your dashboard. In this case, Qrator Labs reverse proxy can be used not only to protect against illegitimate traffic, but also to balance legitimate traffic between upstreams.

Balancing is done independently on each of the Qrator Labs traffic scrubbing centers, which are located in different regions of the world.

You can configure load balancing in the Upstreams section of your dashboard.

Switching between upstream lists

In your dashboard, you can set up one or two upstream lists (by default, only one list is enabled). The first list is considered the main list, and the second, the fallback list.

By default, each traffic scrubbing center balances all requests only between upstreams from the main list, while unavailable upstreams (detected via active or passive checks) are temporarily removed from circulation. Automatic switching to the fallback list and back is performed on the basis of upstream health check:

  • If all upstreams from the main list have become unavailable for the traffic scrubbing center, the center switches to using the fallback list instead of the main one.
  • As soon as the checks reveal that at least one upstream from the main list is available again, the traffic scrubbing center switches back to using the main list.

Upstream selection algorithms

Qrator Labs supports two upstream selection algorithms: Round robin and IP hash. In the dashboard, you can specify which of these algorithms should be used. The selected algorithm will be used both for balancing between the main upstreams, and for balancing between the fallback upstreams.

The work of each algorithm can be influenced by specifying an integer weight for the upstream. Depending on the algorithm used, the upstream weight is interpreted differently. Upstreams with zero weight are considered disconnected and do not participate in balancing, they do not receive traffic. If weights are not specified, it is considered that the weight of each upstream is equal to one.

Round robin

The round robin algorithm works with the list of upstreams as with a queue in the order in which you placed them in your dashboard. By default, each upstream is used to process one request before the queue moves on to the next upstream. So, the algorithm ensures equal distribution of requests among all upstreams from the corresponding list.

The weight of an upstream determines what fraction of requests will be directed to that upstream.

Example

If you use the round robin algorithm for two upstreams with weights 1 and 2, of every three requests, the first will be handled by the first upstream, and the second and third by the second upstream.

IP hash

The IP hash algorithm selects an upstream depending on the IP address of the request, with repeated requests from the same IP address passed to the same upstream. This algorithm is useful if one wants all requests from the same user to be handled by the same upstream, e.g. for more efficient use of the cache on each server.

The weight of an upstream determines the probability of selecting that upstream and assigning it to a user.

Example

If you use the IP hash algorithm for two upstreams with weights 1 and 2, of every 3 000 first requests from users with unique IPs, on average about 1 000 will be sent to the first upstream and about 2 000 to the second. Distribution of further requests from these users depends on the users themselves and can be very different from the initial one.

Upstream health check

One of the advantages of using multiple upstreams is the ability to increase the fault tolerance of the resource as a whole. If one of the upstreams becomes unavailable or stops processing requests quickly, the remaining upstreams can ensure the availability of the resource.

Upstream availability is tracked separately from each Qrator Labs traffic scrubbing center.

There are two available upstream health check methods:

Passive health check

In passive health check, the traffic scrubbing center does not make special diagnostic requests to upstreams, but analyzes errors when processing user requests.

Only safe requests are used for analysis (see Error handling). If more than a third of such requests to an upstream, made within three seconds, ended with errors, then the traffic scrubbing center stops making requests to that particular upstream according to the usual algorithm. Instead no more than one request per three minutes is sent to that upstream, and if the processing of another request succeeds, the upstream is considered available again.

Note that some types of requests are not repeated after unsuccessful submissions; see Error handling. Therefore, users may see errors more often with passive than with active health check.

Active health check

To respond to upstream unavailability before it manifests itself in multiple errors when processing requests from real users, you can use active checking. With active checking, the traffic scrubbing center regularly makes diagnostic requests to a specific URL on each upstream. A decision on whether to use an upstream is made based on the results of such requests.

All active health check parameters are set individually by contacting technical support:

  • Which request will the traffic scrubbing center send to an upstream?

    The protocol (HTTP or HTTPS), URL and HTTP method are configured. If necessary, you can also specify request parameters e.g., a special token so that the upstream can distinguish diagnostic requests from user requests.

  • How often do you need to perform a check?

    The shorter the interval between requests, the faster the traffic scrubbing center will be able to detect problems. However, too frequent requests can sometimes affect the upstream load. The optimal interval depends on your upstream performance.

  • Which check results are considered successful?

    Depending on the characteristics of your resource, either responses with all status codes except server errors (HTTP 5xx) or only responses with the HTTP 200 status code may be considered successful.

    Besides, you can specify how many times in a row the check must fail before the traffic scrubbing center considers the upstream to be unavailable and temporarily stops sending user requests to it. Similarly, you can configure under what conditions the upstream is considered available again.

Error handling

After each request to the upstream, the Qrator Labs traffic scrubbing center analyzes whether the request was successful or an error occurred. Errors are displayed in the Statistics section of the dashboard, and also influences the subsequent behavior of this traffic scrubbing center.

Any of the following situations is considered an error:

  • Connection to an upstream is lost and the traffic scrubbing center fails to re-establish connection within 15 seconds.
  • Upstream failed to respond to a request within 60 seconds or sent an incomplete (incorrect) response.
  • Upstream returned a response with the 502 Bad Gateway or 504 Gateway Timeout status code.

The error handling procedure is different for unsafe and safe requests (see RFC 7231):

  • POST, PUT, PATCH, DELETE, CONNECT requests are considered unsafe.

    Such requests are not redirected to the same or to another upstream as this may break the logic of the application. The user always gets the first upstream response or, if the upstream fails to respond, a response with a 502 Bad Gateway or 504 Gateway Timeout status code depending on the cause of the problem.

  • GET, HEAD, OPTIONS, TRACE requests are considered safe.

    Before returning an error to the user, the traffic scrubbing center sequentially sends the request to other upstreams from the list and waits for responses from them. If any upstream handles the request without errors, its response will be returned to the user. If the traffic scrubbing center goes through all upstreams without receiving a successful response, it will return the error response from the last upstream. Meanwhile, if no response was received from the last upstream, then the traffic scrubbing center itself will generate a response with the 502 Bad Gateway or 504 Gateway Timeout status code depending on the cause of the problem.

If the user terminates the connection without waiting for a response (e.g., due to timeout), attempts to process the request will stop.

Each error is reflected in the dashboard in the Analytics and Summary sections. To quickly detect problems with resource availability, contact Qrator Labs technical support specialists, they will set up automatic notifications that suit you, which will be triggered after a certain number of errors.

expand_less