Mantis - Resin
Viewing Issue Advanced Details
3527 block random 05-21-09 09:48 06-08-09 19:49
iRideSnow  
ferg  
normal  
closed 3.1.9  
fixed  
none    
none 4.0.1  
0003527: Resin fails to restart in a timely fashion (perhaps because of active requests?)
Environment: Resin Pro 3.1.9, Windows 2003 Server 64-bit, Java 1.5_18 64-bit.

I'm seeing a bad problem with Resin "waiting" many minutes before it restarts. I haven't been able to get a thread dump or sample because it happens intermittently. But I am seeing something in the logs that might point in the right direction.

I don't know if this is a bug or something that can be fixed with a configuration change. Nevertheless, I have filed a bug.

This is happening in our development environment, which obviously gets very little traffic. We're planning on upgrading to Resin Pro 3.1.x in our production environment within the next couple of weeks, but obviously this is a big show-stopper for us.

Note: Our restart process is to use the windows "sc stop \\machine resin", followed by a series of "sc query \\machine resin" to make sure it has stopped, followed by an "sc start \\machine resin". With Resin Pro 3.0.21, this process worked flawlessly. It's only with Resin Pro 3.1.9 that we started having this problem.

What I usually see in the log files during a normal restart sequence is something like this. Notice the time stamps. Everything happens pretty quickly.

RESIN [16:06:57.104] {resin-destroy} Server[id=,cluster=app-tier] stopping
RESIN [16:07:01.151] {resin-destroy} Host[admin.caucho] stopping
RESIN [16:07:01.151] {resin-destroy} WebApp[http://admin.caucho/resin-jmx-service] [^] stopping
RESIN [16:07:01.166] {resin-destroy} Host[] stopping
RESIN [16:07:01.166] {resin-destroy} WebApp[http://localhost:80/resin-admin] [^] stopping
RESIN [16:07:01.166] {resin-destroy} WebApp[http://localhost:80] [^] stopping
RESIN [16:07:10.260] {main} Proxy Cache disk-size=1024M memory-size=64M
RESIN [16:07:10.275] {main} PingThread[] starting, checking []
RESIN [16:07:10.400] {main}
RESIN [16:07:10.400] {main} Windows 2003 5.2 amd64
RESIN [16:07:10.400] {main} Java(TM) 2 Runtime Environment, Standard Edition 1.5.0_18-b02, Cp1252, en
RESIN [16:07:10.400] {main} Java HotSpot(TM) 64-Bit Server VM 1.5.0_18-b02, 64, mixed mode, Sun Microsystems Inc.
RESIN [16:07:10.400] {main} user.name: SYSTEM
RESIN [16:07:10.400] {main} resin.home = d:\resin
RESIN [16:07:10.400] {main} resin.root = d:\resin
RESIN [16:07:10.400] {main} resin.conf = /d:/resin/conf/resin.conf

When there's a problem, I'm seeing this. Notice that it took 51 minutes (!!!) before it decided to restart. The only difference from above is that it says, "closing with 1 active requests".

RESIN [05:16:23.713] {resin-destroy} Server[id=,cluster=app-tier] stopping
RESIN [05:16:28.182] {resin-destroy} Host[admin.caucho] stopping
RESIN [05:16:28.182] {resin-destroy} WebApp[http://admin.caucho/resin-jmx-service] [^] stopping
RESIN [05:16:28.182] {resin-destroy} Host[] stopping
RESIN [05:16:28.182] {resin-destroy} WebApp[http://localhost:80/resin-admin] [^] stopping
RESIN [05:16:28.182] {resin-destroy} WebApp[http://localhost:80] [^] stopping
RESIN [05:16:42.713] {resin-destroy} WebApp[http://localhost:80] [^] closing with 1 active requests.
RESIN [06:07:43.978] {main} Proxy Cache disk-size=1024M memory-size=64M
RESIN [06:07:43.994] {main} PingThread[] starting, checking []
RESIN [06:07:44.087] {main}
RESIN [06:07:44.087] {main} Windows 2003 5.2 amd64
RESIN [06:07:44.087] {main} Java(TM) 2 Runtime Environment, Standard Edition 1.5.0_18-b02, Cp1252, en
RESIN [06:07:44.087] {main} Java HotSpot(TM) 64-Bit Server VM 1.5.0_18-b02, 64, mixed mode, Sun Microsystems Inc.
RESIN [06:07:44.087] {main} user.name: SYSTEM
RESIN [06:07:44.087] {main} resin.home = d:\resin
RESIN [06:07:44.087] {main} resin.root = d:\resin
RESIN [06:07:44.087] {main} resin.conf = /d:/resin/conf/resin.conf

Here's a second instance from a different server with the same configuration. This time the restart took 13 minutes. Again, it is closing with active requests (3 this time), along with a message about sessions.

RESIN [05:16:24.092] {resin-destroy} Server[id=,cluster=app-tier] stopping
RESIN [05:16:26.717] {resin-destroy} Host[admin.caucho] stopping
RESIN [05:16:26.733] {resin-destroy} WebApp[http://admin.caucho/resin-jmx-service] [^] stopping
RESIN [05:16:26.733] {resin-destroy} Host[] stopping
RESIN [05:16:26.733] {resin-destroy} WebApp[http://localhost:80/resin-admin] [^] stopping
RESIN [05:16:26.733] {resin-destroy} WebApp[http://localhost:80] [^] stopping
RESIN [05:16:41.624] {resin-destroy} WebApp[http://localhost:80] [^] closing with 3 active requests.
RESIN [05:16:41.624] {resin-destroy} SessionImpl[abcy2ORXBFUU8dSJLfJfs,] LRU while in use (use-count=2). Consider increasing session-count.
RESIN [05:29:35.471] {main} Proxy Cache disk-size=1024M memory-size=64M
RESIN [05:29:35.486] {main} PingThread[] starting, checking []
RESIN [05:29:35.705] {main}
RESIN [05:29:35.705] {main} Windows 2003 5.2 amd64
RESIN [05:29:35.721] {main} Java(TM) 2 Runtime Environment, Standard Edition 1.5.0_18-b02, Cp1252, en
RESIN [05:29:35.721] {main} Java HotSpot(TM) 64-Bit Server VM 1.5.0_18-b02, 64, mixed mode, Sun Microsystems Inc.
RESIN [05:29:35.721] {main} user.name: SYSTEM
RESIN [05:29:35.721] {main} resin.home = D:\resin
RESIN [05:29:35.721] {main} resin.root = D:\resin
RESIN [05:29:35.721] {main} resin.conf = /D:/resin/conf/resin.conf

Any ideas?


Notes
(0004069)
ferg   
06-08-09 19:49   
Changed watchdog logic to allow a 'start' while an old Resin instance is in the process of stopping.