Mantis - Resin
Viewing Issue Advanced Details
1088 major random 05-09-06 03:23 06-08-06 11:35
wet745  
ferg  
normal  
closed 3.0.18  
fixed  
none    
none 3.0.20  
0001088: Unsystematic failure of session clustering when some Resin processes are not active
We have a problem with the session clustering of Reson Professional 3.0.18 on Windows 2000 servers. It seems to work as long as all Resin processes in the cluster are active. When one or more Resin processes are offline, then it seems to fail unsystematically.

Example configuration: 3 Resin processes - A, B, C - are offline at the beginning, session databases have been deleted. Now start 2 of them - A and B -, the third - C - remains offline. Create a session with session attributes in one of the active processes, e.g. process A. Stop that process A, so that only process B remains active. Now, Resin B will serve the request with the process A cookie. In some cases, the session with its attributes will be available in the remaining process B, in other cases it won't.

The Resin log with loglevel "finer" gives the impression that the problem is caused by the cluster synchronisation that occurs when the session attributes are set in process A. It seems that arbitrary, one of the configured Resin processes is used as synchronisation target. If the target is active - synchronization against process B -, then it succeeds. If the target is offline - syncronization against process C -, than the synchonisation will
not happen and session attributes will be lost.
Log excerpts:

Resin A

[11:47:34.664] ClusterStore[null] active
[11:47:35.758] java.net.ConnectException: Connection refused: connect
[11:47:35.758] at java.net.PlainSocketImpl.socketConnect(Native Method)
[11:47:35.758] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
[11:47:35.758] at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
[11:47:35.758] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
[11:47:35.758] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
[11:47:35.758] at java.net.Socket.connect(Socket.java:507)
[11:47:35.758] at java.net.Socket.connect(Socket.java:457)
[11:47:35.758] at java.net.Socket.<init>(Socket.java:365)
[11:47:35.758] at java.net.Socket.<init>(Socket.java:178)
[11:47:35.758] at com.caucho.vfs.TcpStream.<init>(TcpStream.java:55)
[11:47:35.758] at com.caucho.vfs.TcpStream.openReadWrite(TcpStream.java:116)
[11:47:35.758] at com.caucho.vfs.TcpPath.openReadWriteImpl(TcpPath.java:145)
[11:47:35.758] at com.caucho.vfs.Path.openReadWrite(Path.java:789)
[11:47:35.758] at com.caucho.server.cluster.ClusterServer.openTCPPair(ClusterServer.java:367)
[11:47:35.758] at com.caucho.server.cluster.ClusterClient.open(ClusterClient.java:227)
[11:47:35.758] at com.caucho.server.cluster.StoreManager.start(StoreManager.java:317)
[11:47:35.758] at com.caucho.server.cluster.ClusterStore.start(ClusterStore.java:222)
[11:47:35.758] at com.caucho.server.cluster.Cluster.environmentStart(Cluster.java:573)
[11:47:35.758] at com.caucho.loader.EnvironmentClassLoader.start(EnvironmentClassLoader.java:318)
[11:47:35.758] at com.caucho.server.resin.ServletServer.start(ServletServer.java:943)
[11:47:35.758] at com.caucho.server.deploy.DeployController.startImpl(DeployController.java:587)
[11:47:35.758] at com.caucho.server.deploy.AbstractDeployControllerStrategy.start(AbstractDeployControllerStrategy.java:56)
[11:47:35.758] at com.caucho.server.deploy.DeployController.start(DeployController.java:483)
[11:47:35.758] at com.caucho.server.resin.ResinServer.start(ResinServer.java:478)
[11:47:35.758] at com.caucho.server.resin.Resin.init(Resin.java)
[11:47:35.758] at com.caucho.server.resin.Resin.main(Resin.java:623)
...
[11:47:41.274] tcp-store can't connect to server ClusterServer[id=servlett02-1 index=3 host=172.17.56.243:7007 cluster=]
...
[11:48:00.509] create session adf8jGV2pOubjc8IyS
...
[11:48:16.540] Database[/E:/ffng/server/servlett01-2/resin-pro-3.0.18/session]: SELECT access_time,data FROM srun_servlett01_2 WHERE id=?
[11:48:16.556] where-1: static null
[11:48:16.556] index-0: FromItem[srun_servlett01_2 AS srun_servlett01_2] (srun_servlett01_2.id = ?0)
[11:48:16.556] where-0: FromItem[srun_servlett01_2 AS srun_servlett01_2] null
[11:48:16.556] no local object loaded for default;adf8jGV2pOubjc8IyS
[11:48:16.556] reset session adf8jGV2pOubjc8IyS
...
[11:48:20.712] java.net.ConnectException: Connection refused: connect
[11:48:20.712] at java.net.PlainSocketImpl.socketConnect(Native Method)
[11:48:20.712] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
[11:48:20.712] at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
[11:48:20.712] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
[11:48:20.712] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
[11:48:20.712] at java.net.Socket.connect(Socket.java:507)
[11:48:20.712] at java.net.Socket.connect(Socket.java:457)
[11:48:20.712] at java.net.Socket.<init>(Socket.java:365)
[11:48:20.712] at java.net.Socket.<init>(Socket.java:178)
[11:48:20.712] at com.caucho.vfs.TcpStream.<init>(TcpStream.java:55)
[11:48:20.712] at com.caucho.vfs.TcpStream.openReadWrite(TcpStream.java:116)
[11:48:20.712] at com.caucho.vfs.TcpPath.openReadWriteImpl(TcpPath.java:145)
[11:48:20.712] at com.caucho.vfs.Path.openReadWrite(Path.java:789)
[11:48:20.712] at com.caucho.server.cluster.ClusterServer.openTCPPair(ClusterServer.java:367)
[11:48:20.712] at com.caucho.server.cluster.ClusterClient.open(ClusterClient.java:227)
[11:48:20.712] at com.caucho.server.cluster.ClusterStore.saveServer(ClusterStore.java:646)
[11:48:20.712] at com.caucho.server.cluster.ClusterStore.store(ClusterStore.java:597)
[11:48:20.712] at com.caucho.server.cluster.ClusterObject.store(ClusterObject.java:405)
[11:48:20.712] at com.caucho.server.session.SessionImpl.flush(SessionImpl.java:813)
[11:48:20.712] at com.caucho.server.connection.AbstractHttpRequest.saveSession(AbstractHttpRequest.java:2291)
[11:48:20.712] at com.caucho.server.connection.AbstractHttpResponse.sendRedirect(AbstractHttpResponse.java:583)
...
[11:48:20.712] Database[/E:/ffng/server/servlett01-2/resin-pro-3.0.18/session]: UPDATE srun_servlett01_2 SET data=?, mod_time=?, access_time=? WHERE id=?
[11:48:20.712] where-1: static null
[11:48:20.712] index-0: FromItem[srun_servlett01_2 AS srun_servlett01_2] (srun_servlett01_2.id = ?3)
[11:48:20.712] where-0: FromItem[srun_servlett01_2 AS srun_servlett01_2] null
[11:48:20.712] Database[/E:/ffng/server/servlett01-2/resin-pro-3.0.18/session]: INSERT into srun_servlett01_2 (id,data,mod_time,access_time,expire_interval,owner,backup) VALUES(?,?,?,?,?,?,?)
[11:48:20.712] db table srun_servlett01_2 insert row xa:com.caucho.db.store.Transaction@577c5e
[11:48:20.712] Table[srun_servlett01_2:1] allocating block 3 row
[11:48:20.712] ReadBlock[Table[srun_servlett01_2:1],3] create
[11:48:20.712] AutoCommitWriteBlock[Table[srun_servlett01_2:1],3] create
[11:48:20.728] Table[srun_servlett01_2:1] allocating block 4 fragment
[11:48:20.728] ReadBlock[Table[srun_servlett01_2:1],4] create
[11:48:20.728] AutoCommitWriteBlock[Table[srun_servlett01_2:1],4] create
[11:48:20.728] write db-block ReadBlock[Table[srun_servlett01_2:1],4] [0, 65536]
[11:48:20.728] db-block remove AutoCommitWriteBlock[Table[srun_servlett01_2:1],4]
[11:48:20.728] btree insert at 2:24 value:3:0
[11:48:20.728] write db-block ReadBlock[Table[srun_servlett01_2:1],3] [0, 65536]
[11:48:20.728] db-block remove AutoCommitWriteBlock[Table[srun_servlett01_2:1],3]
[11:48:20.728] insert cluster: default;adf8jGV2pOubjc8IyS length:183
[11:48:20.728] [servlett01-2] self-store(0) adf8jGV2pOubjc8IyS length=183
[11:48:20.728] [servlett01-2:1] s 302 Found
[11:48:20.728] [servlett01-2:1] H Content-Language
[11:48:20.728] [servlett01-2:1] S de-CH
[11:48:20.728] [servlett01-2:1] H Location
[11:48:20.728] [servlett01-2:1] S https://finfox.ecofin.ch/s/pershome.xhtml [^]
[11:48:20.728] [servlett01-2:1] H Content-Length
[11:48:20.728] [servlett01-2:1] S 79
[11:48:20.728] [servlett01-2:1] G
[11:48:20.728] [1] chunk: 79
[11:48:20.728] [servlett01-2:1] D:data 79
[11:48:20.728] [servlett01-2:1] Q: quit channel
[11:48:20.728] [servlett01-2:1] complete request - keepalive
...
[11:48:49.307] close session adf8jGV2pOubjc8IyS
[11:48:49.307] remove session adf8jGV2pOubjc8IyS
[11:48:49.307] resin-file: destroy
[11:48:49.307] resin-jsp destroy

Resin B

[11:49:06.182] create session adf8jGV2pOubjc8IyS
[11:49:07.198] java.net.ConnectException: Connection refused: connect
[11:49:07.198] at java.net.PlainSocketImpl.socketConnect(Native Method)
[11:49:07.198] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
[11:49:07.198] at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
[11:49:07.198] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
[11:49:07.198] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
[11:49:07.198] at java.net.Socket.connect(Socket.java:507)
[11:49:07.198] at java.net.Socket.connect(Socket.java:457)
[11:49:07.198] at java.net.Socket.<init>(Socket.java:365)
[11:49:07.198] at java.net.Socket.<init>(Socket.java:178)
[11:49:07.198] at com.caucho.vfs.TcpStream.<init>(TcpStream.java:55)
[11:49:07.198] at com.caucho.vfs.TcpStream.openReadWrite(TcpStream.java:116)
[11:49:07.198] at com.caucho.vfs.TcpPath.openReadWriteImpl(TcpPath.java:145)
[11:49:07.198] at com.caucho.vfs.Path.openReadWrite(Path.java:789)
[11:49:07.198] at com.caucho.server.cluster.ClusterServer.openTCPPair(ClusterServer.java:367)
[11:49:07.198] at com.caucho.server.cluster.ClusterClient.open(ClusterClient.java:227)
[11:49:07.198] at com.caucho.server.cluster.ClusterStore.loadObject(ClusterStore.java:461)
[11:49:07.198] at com.caucho.server.cluster.ClusterStore.load(ClusterStore.java:420)
[11:49:07.198] at com.caucho.server.cluster.ClusterObject.load(ClusterObject.java:253)
[11:49:07.198] at com.caucho.server.session.SessionImpl.load(SessionImpl.java:692)
[11:49:07.198] at com.caucho.server.session.SessionManager.load(SessionManager.java:1217)
[11:49:07.198] at com.caucho.server.session.SessionManager.getSession(SessionManager.java:1079)
[11:49:07.198] at com.caucho.server.connection.AbstractHttpRequest.createSession(AbstractHttpRequest.java:1367)
[11:49:07.198] at com.caucho.server.connection.AbstractHttpRequest.getSession(AbstractHttpRequest.java:1179)
[11:49:07.198] at javax.servlet.http.HttpServletRequestWrapper.getSession(HttpServletRequestWrapper.java:236)
[11:49:07.198] at javax.servlet.http.HttpServletRequestWrapper.getSession(HttpServletRequestWrapper.java:236)
...
[11:49:08.292] java.net.ConnectException: Connection refused: connect
[11:49:08.292] at java.net.PlainSocketImpl.socketConnect(Native Method)
[11:49:08.292] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
[11:49:08.292] at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
[11:49:08.292] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
[11:49:08.292] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
[11:49:08.292] at java.net.Socket.connect(Socket.java:507)
[11:49:08.292] at java.net.Socket.connect(Socket.java:457)
[11:49:08.292] at java.net.Socket.<init>(Socket.java:365)
[11:49:08.292] at java.net.Socket.<init>(Socket.java:178)
[11:49:08.292] at com.caucho.vfs.TcpStream.<init>(TcpStream.java:55)
[11:49:08.292] at com.caucho.vfs.TcpStream.openReadWrite(TcpStream.java:116)
[11:49:08.292] at com.caucho.vfs.TcpPath.openReadWriteImpl(TcpPath.java:145)
[11:49:08.292] at com.caucho.vfs.Path.openReadWrite(Path.java:789)
[11:49:08.292] at com.caucho.server.cluster.ClusterServer.openTCPPair(ClusterServer.java:367)
[11:49:08.292] at com.caucho.server.cluster.ClusterClient.open(ClusterClient.java:227)
[11:49:08.292] at com.caucho.server.cluster.ClusterStore.loadObject(ClusterStore.java:461)
[11:49:08.292] at com.caucho.server.cluster.ClusterStore.load(ClusterStore.java:430)
[11:49:08.292] at com.caucho.server.cluster.ClusterObject.load(ClusterObject.java:253)
[11:49:08.292] at com.caucho.server.session.SessionImpl.load(SessionImpl.java:692)
[11:49:08.292] at com.caucho.server.session.SessionManager.load(SessionManager.java:1217)
[11:49:08.292] at com.caucho.server.session.SessionManager.getSession(SessionManager.java:1079)
[11:49:08.292] at com.caucho.server.connection.AbstractHttpRequest.createSession(AbstractHttpRequest.java:1367)
[11:49:08.292] at com.caucho.server.connection.AbstractHttpRequest.getSession(AbstractHttpRequest.java:1179)
[11:49:08.292] at javax.servlet.http.HttpServletRequestWrapper.getSession(HttpServletRequestWrapper.java:236)
[11:49:08.292] at javax.servlet.http.HttpServletRequestWrapper.getSession(HttpServletRequestWrapper.java:236)
...
[11:49:08.292] create session adf8jGV2pOubjc8IyS
[11:49:08.292] remove session adf8jGV2pOubjc8IyS
...
[11:49:08.495] create session adf8jGV2pOubjc8IyS
[11:49:08.495] create session adf8jGV2pOubjc8IyS
[11:49:08.495] remove session adf8jGV2pOubjc8IyS

Notes
(0001122)
wet745   
05-12-06 05:57   
The mentioned problem seems to occur only when a single Resin process remains active. It seems not to occur when two Resin processes remain active. As a workaround, we now set up 4 Resin processes and try to keep 2 of them running all the time, e.g. during the installation of new releases.
(0001271)
ferg   
06-08-06 11:35   
Resin 3.0.19 and previous only back up persistent information to a single server.

So if the primary and backup for a session both go down, the session information will be lost.

Resin 3.0.20 adds the capability for having a primary, secondary, and tertiary server. Having two backups lets any two servers go down without session loss.

In addition, 3.0.20 localizes the session backups, so the secondary and tertiary server are within 8 servers of the primary. Essentially, this means any two servers in a span of 8 servers can go down without loss.

If you had 64 servers, for example, you could have servers 1, 9, 17, 25, 33, 41, 49, 57 all go down without loss.

server/01s-