0004708: sticky session

(0005449)
uweschaefer_
08-18-11 13:37

Is there else anything we can provide? As a long-time paying customer, we are quite frustrated with our production system being extremely fragile.# due to random server switches.
session replication as a workaround is impractical due to different physical locations.

we're desperate.

(0005450)
ferg
08-18-11 16:14

When filing a bug as a paying customer, please also send a mail to the support address so we can increase the priority. Otherwise the bug report looks like an open-source report.

The mod_caucho should only failover if it cannot connect to the backend Resin instance. (The mod_caucho code hasn't changed in a long while.)

Can you send the server and load-balance timeout parameters? mod_caucho reads those from the backend server to see what values to use for a timeout.

(0005451)
ferg
08-18-11 16:36

Checking the code, the key parameters are

load-balance-connect-timeout
load-balance-socket-timeout
load-balance-idle-time

keepalive-timeout
socket-timeout

(also keepalive-max).

mod_caucho's view of the values should be displayed in /caucho-status

(0005452)
ferg
08-18-11 17:25

Also, the /resin-admin graphs (in the "meters" tab of the summary) might show unusual netstat behavior or other glitches like a memory spike.

(0005453)
ferg
08-18-11 18:37

As a test, you might try lowering load-balance-idle-time to 30s instead of the default 60s. (And check the netstat history). That would test the possibility of a timing issue without affecting performance much.

(0005454)
uweschaefer_
08-19-11 04:39

thanks for the hints, scott.

config everywhere is

connect timeout : 5
idle time: 60
recover : 15
socket timeout : 600

frontend apache2 has

KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15

no keepalives defined in resin.

we can reproduce the behavior internally, so it does not look like being connected to high load.

(0005455)
ferg
08-19-11 09:04

If you can reproduce it in the lab, can you set the logging to "fine" or "finer" on both backend Resin instances, and mail the jvm-default logs?

BTW, it's the Resin keepalive that matters (because this is the mod_caucho to Resin link). The Apache one doesn't matter.

The http://caucho.com/resin-4.0/admin/clustering-overview.xtp [^] page has a diagram showing the load balancing timings.

Also, the JMX for the resin:type=Port,name=XXX-6800 will show the SocketTimeout and KeepaliveTimeout.

Our load testing wasn't able to show any problems, though.

(0006204)
alex
02-27-13 16:15

can't reproduce

Issue History
Date Modified	Username	Field	Change
08-11-11 05:00	georgbuschbeck	New Issue
08-18-11 13:34	uweschaefer_	Issue Monitored: uweschaefer_
08-18-11 13:37	uweschaefer_	Note Added: 0005449
08-18-11 16:14	ferg	Note Added: 0005450
08-18-11 16:36	ferg	Note Added: 0005451
08-18-11 17:25	ferg	Note Added: 0005452
08-18-11 18:37	ferg	Note Added: 0005453
08-19-11 04:39	uweschaefer_	Note Added: 0005454
08-19-11 09:04	ferg	Note Added: 0005455
09-23-11 01:08	amukas	Issue Monitored: amukas
02-27-13 16:15	alex	Status	new => assigned
02-27-13 16:15	alex	Assigned To	=> alex
02-27-13 16:15	alex	Status	assigned => closed
02-27-13 16:15	alex	Note Added: 0006204
02-27-13 16:15	alex	Resolution	open => fixed
02-27-13 16:15	alex	Fixed in Version	=> 4.0.36

Notes
(0005449) uweschaefer_ 08-18-11 13:37	Is there else anything we can provide? As a long-time paying customer, we are quite frustrated with our production system being extremely fragile.# due to random server switches. session replication as a workaround is impractical due to different physical locations. we're desperate.

(0005450) ferg 08-18-11 16:14	When filing a bug as a paying customer, please also send a mail to the support address so we can increase the priority. Otherwise the bug report looks like an open-source report. The mod_caucho should only failover if it cannot connect to the backend Resin instance. (The mod_caucho code hasn't changed in a long while.) Can you send the server and load-balance timeout parameters? mod_caucho reads those from the backend server to see what values to use for a timeout.

(0005451) ferg 08-18-11 16:36	Checking the code, the key parameters are load-balance-connect-timeout load-balance-socket-timeout load-balance-idle-time keepalive-timeout socket-timeout (also keepalive-max). mod_caucho's view of the values should be displayed in /caucho-status

(0005452) ferg 08-18-11 17:25	Also, the /resin-admin graphs (in the "meters" tab of the summary) might show unusual netstat behavior or other glitches like a memory spike.

(0005453) ferg 08-18-11 18:37	As a test, you might try lowering load-balance-idle-time to 30s instead of the default 60s. (And check the netstat history). That would test the possibility of a timing issue without affecting performance much.

(0005454) uweschaefer_ 08-19-11 04:39	thanks for the hints, scott. config everywhere is connect timeout : 5 idle time: 60 recover : 15 socket timeout : 600 frontend apache2 has KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 15 no keepalives defined in resin. we can reproduce the behavior internally, so it does not look like being connected to high load.

(0005455) ferg 08-19-11 09:04	If you can reproduce it in the lab, can you set the logging to "fine" or "finer" on both backend Resin instances, and mail the jvm-default logs? BTW, it's the Resin keepalive that matters (because this is the mod_caucho to Resin link). The Apache one doesn't matter. The http://caucho.com/resin-4.0/admin/clustering-overview.xtp [^] page has a diagram showing the load balancing timings. Also, the JMX for the resin:type=Port,name=XXX-6800 will show the SocketTimeout and KeepaliveTimeout. Our load testing wasn't able to show any problems, though.

(0006204) alex 02-27-13 16:15	can't reproduce

Relationships