Mantis - Resin
Viewing Issue Advanced Details
3418 minor always 03-25-09 10:46 06-08-09 19:11
phcollard  
ferg  
normal  
closed 3.1.8  
fixed  
none    
none 4.0.1  
0003418: Watchdog startup synchronization issue
This is a reopen of issue 2840. I upgraded to 3.1.8 yesterday but I still experience the same problem. See message in watchdog log below.

http://bugs.caucho.com/view.php?id=2840 [^]

[2009/03/25 13:35:47.833] Watchdog stop: admin
[2009/03/25 13:35:47.848] WatchdogProcess[Watchdog[admin],1] stopping Resin
[2009/03/25 13:35:49.442] java.lang.IllegalStateException: Can't start new task because of old task 'WatchdogTask[Watchdog[admin]]'
[2009/03/25 13:35:49.442] at com.caucho.boot.Watchdog.start(Watchdog.java:296)
[2009/03/25 13:35:49.442] at com.caucho.boot.WatchdogManager.startServer(WatchdogManager.java:295)
[2009/03/25 13:35:49.442] at com.caucho.boot.WatchdogServlet.start(WatchdogServlet.java:75)
[2009/03/25 13:35:49.442] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2009/03/25 13:35:49.442] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[2009/03/25 13:35:49.442] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[2009/03/25 13:35:49.442] at java.lang.reflect.Method.invoke(Method.java:597)
[2009/03/25 13:35:49.442] at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:180)
[2009/03/25 13:35:49.442] at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:109)
[2009/03/25 13:35:49.442] at com.caucho.hessian.server.HessianServlet.service(HessianServlet.java:396)
[2009/03/25 13:35:49.442] at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:103)
[2009/03/25 13:35:49.442] at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:187)
[2009/03/25 13:35:49.442] at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:265)
[2009/03/25 13:35:49.442] at com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:436)
[2009/03/25 13:35:49.442] at com.caucho.server.port.TcpConnection.run(TcpConnection.java:682)
[2009/03/25 13:35:49.442] at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:730)
[2009/03/25 13:35:49.442] at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:649)
[2009/03/25 13:35:49.442] at java.lang.Thread.run(Thread.java:619)

Notes
(0004017)
iRideSnow   
05-14-09 10:59   
I'm using Resin Pro 3.1.9 and am seeing this same issue. We use the Windows sc (service controller) command to stop and start resin from our deployment machine.

Going to try a couple hack/work-arounds. But it would be nice if this didn't happen at all.

Rob
(0004019)
iRideSnow   
05-14-09 12:22   
I built in a five second delay between the sc stop command (and successful confirmation via an sc query that the service is actually stopped) and the sc start command.

I am, of course, concerned that five seconds might not always be enough time. Is there any way to know if this is a legitimate hack? Re-upgrading to a completely new release (3.1.10) is going to be very tough on our current timetable. Is there any way I can get a fixed version of just the jar that contains a fixed version of the watchdog code that's failing?
(0004020)
phcollard   
05-14-09 12:28   
Thanks for your note Rob.

Still no news from Caucho on this issue. I use a simple .bat script to restart Resin services. I estimated the delay by trial and error. If a site is not heavily loaded 5s may be enough, but I noticed that on our largest sites I had to set the delay to 15s.

I am not sure upgrading to 3.1.10 would be the solution as this issue is still "open", hence nobody at Caucho took care of it in the newer release.
(0004021)
iRideSnow   
05-14-09 12:32   
Figured it would be good to update with the Resin Pro 3.1.9 stack trace, since it's different from the one above:

[2009/05/14 06:29:58.873] Watchdog stop:
[2009/05/14 06:29:58.873] WatchdogProcess[Watchdog[],1] stopping Resin
[2009/05/14 06:30:01.827] java.lang.IllegalStateException: Can't start new task because of old task 'WatchdogTask[Watchdog[]]'
[2009/05/14 06:30:01.827] at com.caucho.boot.Watchdog.start(Watchdog.java:296)
[2009/05/14 06:30:01.827] at com.caucho.boot.WatchdogManager.startServer(WatchdogManager.java:295)
[2009/05/14 06:30:01.827] at com.caucho.boot.WatchdogServlet.start(WatchdogServlet.java:75)
(0004022)
iRideSnow   
05-14-09 12:43   
My point about 3.1.10 is that it doesn't even exist yet. To wait for the Dev/QA cycle on that release to complete, considering they just recently released 3.1.9, definitely won't work for us.

15 (or more!) seconds is really bad. I'm looking at the watchdog code now. There's obviously a synchronization issue, but after only five minutes of looking at it I don't know this code well enough to be able to craft a solution. I'm going to keep looking though, see if I can figure something out.

Ugh. This could be pretty bad for us. :(
(0004023)
iRideSnow   
05-14-09 13:10   
phcollard (sorry, don't know your real name),

With the 15s delay you have built in, are you waiting for Windows to tell you that the resin service has stopped before starting that 15s timer? Or is the 15 seconds from the time that you tell it to stop? It can certainly take resin some variable time to stop.

I'm hoping that, since I am starting my 5 second timer *after* I have confirmed (via the sc query command) that the resin service has stopped, that it should be enough. How is your stop/start configured? Do you have resin installed as a service and are you using the Windows SC command to control it?

Rob
(0004068)
ferg   
06-08-09 19:11   
Changed timing to wait for Resin to completely shutdown using BAM/HMTP.