Mantis - Resin
Viewing Issue Advanced Details
2279 minor always 12-28-07 10:58 01-02-08 10:44
sam  
ferg  
urgent  
closed 3.1.4  
fixed  
none    
none 3.1.5  
0002279: load-balance-recover-time
(rep by A Balandran)

This is in regards to the
load-balance-recover-time setting. We have ours set to 0s. This would make the
server available at all times, even after an error.

We are seeing cases we a new request comes in after an unexpected end of file
error that are being sent to the backup backend server. Here is a snippet of
the logs that shows this case:

-- Request that fails
[14:35:10.937] {http-web-443-555} load balance [web-tier->app-crusader:18849] URL /cwsreq
[14:35:10.937] {http-web-443-555} load balance [web-tier->app-crusader:18849] Host: ssl.4aabbcc.com
[14:35:10.937] {http-web-443-555} load balance [web-tier->app-crusader:18849] Accept: */*
[14:35:10.937] {http-web-443-555} load balance [web-tier->app-crusader:18849] Cookie: JSESSIONID=bda0mO0Vzhl3t4HuPVDCr
[14:35:10.937] {http-web-443-555} load balance [web-tier->app-crusader:18849] Content-type: text/xml
[14:35:10.937] {http-web-443-555} load balance [web-tier->app-crusader:18849] Content-Length: 8937
[14:35:10.937] {http-web-443-555} load balance [web-tier->app-crusader:18849] unexpected end of file
[14:35:10.937] {http-web-443-555} close ClusterStream[[web-tier->app-crusader:18849]]
---- New Request
[14:35:11.556] {http-web-443-555} load-balance for session bda0mO0Vzhl3t4HuPVDCr primary web-tier->app-crusader connection failed.
[14:35:11.556] {http-web-443-555} connect ClusterStream[[web-tier->app-undercity:18994]]
[14:35:11.556] {http-web-443-555} load balance [web-tier->app-undercity:18994] URL /cwsreq
[14:35:11.556] {http-web-443-555} load balance [web-tier->app-undercity:18994] Host: ssl.4aabbcc.com
[14:35:11.556] {http-web-443-555} load balance [web-tier->app-undercity:18994] Accept: */*
[14:35:11.556] {http-web-443-555} load balance [web-tier->app-undercity:18994] Cookie: JSESSIONID=bda0mO0Vzhl3t4HuPVDCr
[14:35:11.556] {http-web-443-555} load balance [web-tier->app-undercity:18994] Content-type: text/xml
[14:35:11.556] {http-web-443-555} load balance [web-tier->app-undercity:18994] Content-Length: 139
[14:35:11.567] {http-web-443-555} load balance [web-tier->app-undercity:18994] 200 OK
[14:35:11.567] {http-web-443-555} load balance [web-tier->app-undercity:18994] M cpu-load:0
[14:35:11.567] {http-web-443-555} load balance [web-tier->app-undercity:18994] D: 119
[14:35:11.567] {http-web-443-555} load balance [web-tier->app-undercity:18994] Q (keepalive)

Why does the second request fail to connect to the backend that just received
the error? I don't see any type of error, so I assumed this is the load
balancer marking the failed server as bad. But this should be ignored with 0s
load-balance-recover-time, correct?

Notes
(0002610)
ferg   
01-02-08 10:44   
server/269w