Mantis Bugtracker
  

Viewing Issue Advanced Details Jump to Notes ] View Simple ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0001088 [Resin] major random 05-09-06 03:23 06-08-06 11:35
Reporter wet745 View Status public  
Assigned To ferg
Priority normal Resolution fixed Platform
Status closed   OS
Projection none   OS Version
ETA none Fixed in Version 3.0.20 Product Version 3.0.18
  Product Build
Summary 0001088: Unsystematic failure of session clustering when some Resin processes are not active
Description We have a problem with the session clustering of Reson Professional 3.0.18 on Windows 2000 servers. It seems to work as long as all Resin processes in the cluster are active. When one or more Resin processes are offline, then it seems to fail unsystematically.

Example configuration: 3 Resin processes - A, B, C - are offline at the beginning, session databases have been deleted. Now start 2 of them - A and B -, the third - C - remains offline. Create a session with session attributes in one of the active processes, e.g. process A. Stop that process A, so that only process B remains active. Now, Resin B will serve the request with the process A cookie. In some cases, the session with its attributes will be available in the remaining process B, in other cases it won't.

The Resin log with loglevel "finer" gives the impression that the problem is caused by the cluster synchronisation that occurs when the session attributes are set in process A. It seems that arbitrary, one of the configured Resin processes is used as synchronisation target. If the target is active - synchronization against process B -, then it succeeds. If the target is offline - syncronization against process C -, than the synchonisation will
not happen and session attributes will be lost.
Steps To Reproduce
Additional Information Log excerpts:

Resin A

[11:47:34.664] ClusterStore[null] active
[11:47:35.758] java.net.ConnectException: Connection refused: connect
[11:47:35.758] at java.net.PlainSocketImpl.socketConnect(Native Method)
[11:47:35.758] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
[11:47:35.758] at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
[11:47:35.758] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
[11:47:35.758] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
[11:47:35.758] at java.net.Socket.connect(Socket.java:507)
[11:47:35.758] at java.net.Socket.connect(Socket.java:457)
[11:47:35.758] at java.net.Socket.<init>(Socket.java:365)
[11:47:35.758] at java.net.Socket.<init>(Socket.java:178)
[11:47:35.758] at com.caucho.vfs.TcpStream.<init>(TcpStream.java:55)
[11:47:35.758] at com.caucho.vfs.TcpStream.openReadWrite(TcpStream.java:116)
[11:47:35.758] at com.caucho.vfs.TcpPath.openReadWriteImpl(TcpPath.java:145)
[11:47:35.758] at com.caucho.vfs.Path.openReadWrite(Path.java:789)
[11:47:35.758] at com.caucho.server.cluster.ClusterServer.openTCPPair(ClusterServer.java:367)
[11:47:35.758] at com.caucho.server.cluster.ClusterClient.open(ClusterClient.java:227)
[11:47:35.758] at com.caucho.server.cluster.StoreManager.start(StoreManager.java:317)
[11:47:35.758] at com.caucho.server.cluster.ClusterStore.start(ClusterStore.java:222)
[11:47:35.758] at com.caucho.server.cluster.Cluster.environmentStart(Cluster.java:573)
[11:47:35.758] at com.caucho.loader.EnvironmentClassLoader.start(EnvironmentClassLoader.java:318)
[11:47:35.758] at com.caucho.server.resin.ServletServer.start(ServletServer.java:943)
[11:47:35.758] at com.caucho.server.deploy.DeployController.startImpl(DeployController.java:587)
[11:47:35.758] at com.caucho.server.deploy.AbstractDeployControllerStrategy.start(AbstractDeployControllerStrategy.java:56)
[11:47:35.758] at com.caucho.server.deploy.DeployController.start(DeployController.java:483)
[11:47:35.758] at com.caucho.server.resin.ResinServer.start(ResinServer.java:478)
[11:47:35.758] at com.caucho.server.resin.Resin.init(Resin.java)
[11:47:35.758] at com.caucho.server.resin.Resin.main(Resin.java:623)
...
[11:47:41.274] tcp-store can't connect to server ClusterServer[id=servlett02-1 index=3 host=172.17.56.243:7007 cluster=]
...
[11:48:00.509] create session adf8jGV2pOubjc8IyS
...
[11:48:16.540] Database[/E:/ffng/server/servlett01-2/resin-pro-3.0.18/session]: SELECT access_time,data FROM srun_servlett01_2 WHERE id=?
[11:48:16.556] where-1: static null
[11:48:16.556] index-0: FromItem[srun_servlett01_2 AS srun_servlett01_2] (srun_servlett01_2.id = ?0)
[11:48:16.556] where-0: FromItem[srun_servlett01_2 AS srun_servlett01_2] null
[11:48:16.556] no local object loaded for default;adf8jGV2pOubjc8IyS
[11:48:16.556] reset session adf8jGV2pOubjc8IyS
...
[11:48:20.712] java.net.ConnectException: Connection refused: connect
[11:48:20.712] at java.net.PlainSocketImpl.socketConnect(Native Method)
[11:48:20.712] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
[11:48:20.712] at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
[11:48:20.712] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
[11:48:20.712] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
[11:48:20.712] at java.net.Socket.connect(Socket.java:507)
[11:48:20.712] at java.net.Socket.connect(Socket.java:457)
[11:48:20.712] at java.net.Socket.<init>(Socket.java:365)
[11:48:20.712] at java.net.Socket.<init>(Socket.java:178)
[11:48:20.712] at com.caucho.vfs.TcpStream.<init>(TcpStream.java:55)
[11:48:20.712] at com.caucho.vfs.TcpStream.openReadWrite(TcpStream.java:116)
[11:48:20.712] at com.caucho.vfs.TcpPath.openReadWriteImpl(TcpPath.java:145)
[11:48:20.712] at com.caucho.vfs.Path.openReadWrite(Path.java:789)
[11:48:20.712] at com.caucho.server.cluster.ClusterServer.openTCPPair(ClusterServer.java:367)
[11:48:20.712] at com.caucho.server.cluster.ClusterClient.open(ClusterClient.java:227)
[11:48:20.712] at com.caucho.server.cluster.ClusterStore.saveServer(ClusterStore.java:646)
[11:48:20.712] at com.caucho.server.cluster.ClusterStore.store(ClusterStore.java:597)
[11:48:20.712] at com.caucho.server.cluster.ClusterObject.store(ClusterObject.java:405)
[11:48:20.712] at com.caucho.server.session.SessionImpl.flush(SessionImpl.java:813)
[11:48:20.712] at com.caucho.server.connection.AbstractHttpRequest.saveSession(AbstractHttpRequest.java:2291)
[11:48:20.712] at com.caucho.server.connection.AbstractHttpResponse.sendRedirect(AbstractHttpResponse.java:583)
...
[11:48:20.712] Database[/E:/ffng/server/servlett01-2/resin-pro-3.0.18/session]: UPDATE srun_servlett01_2 SET data=?, mod_time=?, access_time=? WHERE id=?
[11:48:20.712] where-1: static null
[11:48:20.712] index-0: FromItem[srun_servlett01_2 AS srun_servlett01_2] (srun_servlett01_2.id = ?3)
[11:48:20.712] where-0: FromItem[srun_servlett01_2 AS srun_servlett01_2] null
[11:48:20.712] Database[/E:/ffng/server/servlett01-2/resin-pro-3.0.18/session]: INSERT into srun_servlett01_2 (id,data,mod_time,access_time,expire_interval,owner,backup) VALUES(?,?,?,?,?,?,?)
[11:48:20.712] db table srun_servlett01_2 insert row xa:com.caucho.db.store.Transaction@577c5e
[11:48:20.712] Table[srun_servlett01_2:1] allocating block 3 row
[11:48:20.712] ReadBlock[Table[srun_servlett01_2:1],3] create
[11:48:20.712] AutoCommitWriteBlock[Table[srun_servlett01_2:1],3] create
[11:48:20.728] Table[srun_servlett01_2:1] allocating block 4 fragment
[11:48:20.728] ReadBlock[Table[srun_servlett01_2:1],4] create
[11:48:20.728] AutoCommitWriteBlock[Table[srun_servlett01_2:1],4] create
[11:48:20.728] write db-block ReadBlock[Table[srun_servlett01_2:1],4] [0, 65536]
[11:48:20.728] db-block remove AutoCommitWriteBlock[Table[srun_servlett01_2:1],4]
[11:48:20.728] btree insert at 2:24 value:3:0
[11:48:20.728] write db-block ReadBlock[Table[srun_servlett01_2:1],3] [0, 65536]
[11:48:20.728] db-block remove AutoCommitWriteBlock[Table[srun_servlett01_2:1],3]
[11:48:20.728] insert cluster: default;adf8jGV2pOubjc8IyS length:183
[11:48:20.728] [servlett01-2] self-store(0) adf8jGV2pOubjc8IyS length=183
[11:48:20.728] [servlett01-2:1] s 302 Found
[11:48:20.728] [servlett01-2:1] H Content-Language
[11:48:20.728] [servlett01-2:1] S de-CH
[11:48:20.728] [servlett01-2:1] H Location
[11:48:20.728] [servlett01-2:1] S https://finfox.ecofin.ch/s/pershome.xhtml [^]
[11:48:20.728] [servlett01-2:1] H Content-Length
[11:48:20.728] [servlett01-2:1] S 79
[11:48:20.728] [servlett01-2:1] G
[11:48:20.728] [1] chunk: 79
[11:48:20.728] [servlett01-2:1] D:data 79
[11:48:20.728] [servlett01-2:1] Q: quit channel
[11:48:20.728] [servlett01-2:1] complete request - keepalive
...
[11:48:49.307] close session adf8jGV2pOubjc8IyS
[11:48:49.307] remove session adf8jGV2pOubjc8IyS
[11:48:49.307] resin-file: destroy
[11:48:49.307] resin-jsp destroy

Resin B

[11:49:06.182] create session adf8jGV2pOubjc8IyS
[11:49:07.198] java.net.ConnectException: Connection refused: connect
[11:49:07.198] at java.net.PlainSocketImpl.socketConnect(Native Method)
[11:49:07.198] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
[11:49:07.198] at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
[11:49:07.198] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
[11:49:07.198] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
[11:49:07.198] at java.net.Socket.connect(Socket.java:507)
[11:49:07.198] at java.net.Socket.connect(Socket.java:457)
[11:49:07.198] at java.net.Socket.<init>(Socket.java:365)
[11:49:07.198] at java.net.Socket.<init>(Socket.java:178)
[11:49:07.198] at com.caucho.vfs.TcpStream.<init>(TcpStream.java:55)
[11:49:07.198] at com.caucho.vfs.TcpStream.openReadWrite(TcpStream.java:116)
[11:49:07.198] at com.caucho.vfs.TcpPath.openReadWriteImpl(TcpPath.java:145)
[11:49:07.198] at com.caucho.vfs.Path.openReadWrite(Path.java:789)
[11:49:07.198] at com.caucho.server.cluster.ClusterServer.openTCPPair(ClusterServer.java:367)
[11:49:07.198] at com.caucho.server.cluster.ClusterClient.open(ClusterClient.java:227)
[11:49:07.198] at com.caucho.server.cluster.ClusterStore.loadObject(ClusterStore.java:461)
[11:49:07.198] at com.caucho.server.cluster.ClusterStore.load(ClusterStore.java:420)
[11:49:07.198] at com.caucho.server.cluster.ClusterObject.load(ClusterObject.java:253)
[11:49:07.198] at com.caucho.server.session.SessionImpl.load(SessionImpl.java:692)
[11:49:07.198] at com.caucho.server.session.SessionManager.load(SessionManager.java:1217)
[11:49:07.198] at com.caucho.server.session.SessionManager.getSession(SessionManager.java:1079)
[11:49:07.198] at com.caucho.server.connection.AbstractHttpRequest.createSession(AbstractHttpRequest.java:1367)
[11:49:07.198] at com.caucho.server.connection.AbstractHttpRequest.getSession(AbstractHttpRequest.java:1179)
[11:49:07.198] at javax.servlet.http.HttpServletRequestWrapper.getSession(HttpServletRequestWrapper.java:236)
[11:49:07.198] at javax.servlet.http.HttpServletRequestWrapper.getSession(HttpServletRequestWrapper.java:236)
...
[11:49:08.292] java.net.ConnectException: Connection refused: connect
[11:49:08.292] at java.net.PlainSocketImpl.socketConnect(Native Method)
[11:49:08.292] at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
[11:49:08.292] at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
[11:49:08.292] at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
[11:49:08.292] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
[11:49:08.292] at java.net.Socket.connect(Socket.java:507)
[11:49:08.292] at java.net.Socket.connect(Socket.java:457)
[11:49:08.292] at java.net.Socket.<init>(Socket.java:365)
[11:49:08.292] at java.net.Socket.<init>(Socket.java:178)
[11:49:08.292] at com.caucho.vfs.TcpStream.<init>(TcpStream.java:55)
[11:49:08.292] at com.caucho.vfs.TcpStream.openReadWrite(TcpStream.java:116)
[11:49:08.292] at com.caucho.vfs.TcpPath.openReadWriteImpl(TcpPath.java:145)
[11:49:08.292] at com.caucho.vfs.Path.openReadWrite(Path.java:789)
[11:49:08.292] at com.caucho.server.cluster.ClusterServer.openTCPPair(ClusterServer.java:367)
[11:49:08.292] at com.caucho.server.cluster.ClusterClient.open(ClusterClient.java:227)
[11:49:08.292] at com.caucho.server.cluster.ClusterStore.loadObject(ClusterStore.java:461)
[11:49:08.292] at com.caucho.server.cluster.ClusterStore.load(ClusterStore.java:430)
[11:49:08.292] at com.caucho.server.cluster.ClusterObject.load(ClusterObject.java:253)
[11:49:08.292] at com.caucho.server.session.SessionImpl.load(SessionImpl.java:692)
[11:49:08.292] at com.caucho.server.session.SessionManager.load(SessionManager.java:1217)
[11:49:08.292] at com.caucho.server.session.SessionManager.getSession(SessionManager.java:1079)
[11:49:08.292] at com.caucho.server.connection.AbstractHttpRequest.createSession(AbstractHttpRequest.java:1367)
[11:49:08.292] at com.caucho.server.connection.AbstractHttpRequest.getSession(AbstractHttpRequest.java:1179)
[11:49:08.292] at javax.servlet.http.HttpServletRequestWrapper.getSession(HttpServletRequestWrapper.java:236)
[11:49:08.292] at javax.servlet.http.HttpServletRequestWrapper.getSession(HttpServletRequestWrapper.java:236)
...
[11:49:08.292] create session adf8jGV2pOubjc8IyS
[11:49:08.292] remove session adf8jGV2pOubjc8IyS
...
[11:49:08.495] create session adf8jGV2pOubjc8IyS
[11:49:08.495] create session adf8jGV2pOubjc8IyS
[11:49:08.495] remove session adf8jGV2pOubjc8IyS
Attached Files

- Relationships

- Notes
(0001122)
wet745
05-12-06 05:57

The mentioned problem seems to occur only when a single Resin process remains active. It seems not to occur when two Resin processes remain active. As a workaround, we now set up 4 Resin processes and try to keep 2 of them running all the time, e.g. during the installation of new releases.
 
(0001271)
ferg
06-08-06 11:35

Resin 3.0.19 and previous only back up persistent information to a single server.

So if the primary and backup for a session both go down, the session information will be lost.

Resin 3.0.20 adds the capability for having a primary, secondary, and tertiary server. Having two backups lets any two servers go down without session loss.

In addition, 3.0.20 localizes the session backups, so the secondary and tertiary server are within 8 servers of the primary. Essentially, this means any two servers in a span of 8 servers can go down without loss.

If you had 64 servers, for example, you could have servers 1, 9, 17, 25, 33, 41, 49, 57 all go down without loss.

server/01s-
 

- Issue History
Date Modified Username Field Change
05-09-06 03:23 wet745 New Issue
05-11-06 08:13 mindlink Issue Monitored: mindlink
05-12-06 05:57 wet745 Note Added: 0001122
06-08-06 11:35 ferg Note Added: 0001271
06-08-06 11:35 ferg Assigned To  => ferg
06-08-06 11:35 ferg Status new => closed
06-08-06 11:35 ferg Resolution open => fixed
06-08-06 11:35 ferg Fixed in Version  => 3.0.20


Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
33 total queries executed.
28 unique queries executed.
Powered by Mantis Bugtracker