Mantis Bugtracker

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0003418 [Resin] minor always 03-25-09 10:46 06-08-09 19:11
Reporter phcollard View Status public  
Assigned To ferg
Priority normal Resolution fixed  
Status closed   Product Version 3.1.8
Summary 0003418: Watchdog startup synchronization issue
Description This is a reopen of issue 2840. I upgraded to 3.1.8 yesterday but I still experience the same problem. See message in watchdog log below. [^]

Additional Information [2009/03/25 13:35:47.833] Watchdog stop: admin
[2009/03/25 13:35:47.848] WatchdogProcess[Watchdog[admin],1] stopping Resin
[2009/03/25 13:35:49.442] java.lang.IllegalStateException: Can't start new task because of old task 'WatchdogTask[Watchdog[admin]]'
[2009/03/25 13:35:49.442] at com.caucho.boot.Watchdog.start(
[2009/03/25 13:35:49.442] at com.caucho.boot.WatchdogManager.startServer(
[2009/03/25 13:35:49.442] at com.caucho.boot.WatchdogServlet.start(
[2009/03/25 13:35:49.442] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2009/03/25 13:35:49.442] at sun.reflect.NativeMethodAccessorImpl.invoke(
[2009/03/25 13:35:49.442] at sun.reflect.DelegatingMethodAccessorImpl.invoke(
[2009/03/25 13:35:49.442] at java.lang.reflect.Method.invoke(
[2009/03/25 13:35:49.442] at com.caucho.hessian.server.HessianSkeleton.invoke(
[2009/03/25 13:35:49.442] at com.caucho.hessian.server.HessianSkeleton.invoke(
[2009/03/25 13:35:49.442] at com.caucho.hessian.server.HessianServlet.service(
[2009/03/25 13:35:49.442] at com.caucho.server.dispatch.ServletFilterChain.doFilter(
[2009/03/25 13:35:49.442] at com.caucho.server.webapp.WebAppFilterChain.doFilter(
[2009/03/25 13:35:49.442] at com.caucho.server.dispatch.ServletInvocation.service(
[2009/03/25 13:35:49.442] at com.caucho.server.hmux.HmuxRequest.handleRequest(
[2009/03/25 13:35:49.442] at
[2009/03/25 13:35:49.442] at com.caucho.util.ThreadPool$Item.runTasks(
[2009/03/25 13:35:49.442] at com.caucho.util.ThreadPool$
[2009/03/25 13:35:49.442] at
Attached Files

- Relationships

- Notes
05-14-09 10:59

I'm using Resin Pro 3.1.9 and am seeing this same issue. We use the Windows sc (service controller) command to stop and start resin from our deployment machine.

Going to try a couple hack/work-arounds. But it would be nice if this didn't happen at all.

05-14-09 12:22

I built in a five second delay between the sc stop command (and successful confirmation via an sc query that the service is actually stopped) and the sc start command.

I am, of course, concerned that five seconds might not always be enough time. Is there any way to know if this is a legitimate hack? Re-upgrading to a completely new release (3.1.10) is going to be very tough on our current timetable. Is there any way I can get a fixed version of just the jar that contains a fixed version of the watchdog code that's failing?
05-14-09 12:28

Thanks for your note Rob.

Still no news from Caucho on this issue. I use a simple .bat script to restart Resin services. I estimated the delay by trial and error. If a site is not heavily loaded 5s may be enough, but I noticed that on our largest sites I had to set the delay to 15s.

I am not sure upgrading to 3.1.10 would be the solution as this issue is still "open", hence nobody at Caucho took care of it in the newer release.
05-14-09 12:32

Figured it would be good to update with the Resin Pro 3.1.9 stack trace, since it's different from the one above:

[2009/05/14 06:29:58.873] Watchdog stop:
[2009/05/14 06:29:58.873] WatchdogProcess[Watchdog[],1] stopping Resin
[2009/05/14 06:30:01.827] java.lang.IllegalStateException: Can't start new task because of old task 'WatchdogTask[Watchdog[]]'
[2009/05/14 06:30:01.827] at com.caucho.boot.Watchdog.start(
[2009/05/14 06:30:01.827] at com.caucho.boot.WatchdogManager.startServer(
[2009/05/14 06:30:01.827] at com.caucho.boot.WatchdogServlet.start(
05-14-09 12:43

My point about 3.1.10 is that it doesn't even exist yet. To wait for the Dev/QA cycle on that release to complete, considering they just recently released 3.1.9, definitely won't work for us.

15 (or more!) seconds is really bad. I'm looking at the watchdog code now. There's obviously a synchronization issue, but after only five minutes of looking at it I don't know this code well enough to be able to craft a solution. I'm going to keep looking though, see if I can figure something out.

Ugh. This could be pretty bad for us. :(
05-14-09 13:10

phcollard (sorry, don't know your real name),

With the 15s delay you have built in, are you waiting for Windows to tell you that the resin service has stopped before starting that 15s timer? Or is the 15 seconds from the time that you tell it to stop? It can certainly take resin some variable time to stop.

I'm hoping that, since I am starting my 5 second timer *after* I have confirmed (via the sc query command) that the resin service has stopped, that it should be enough. How is your stop/start configured? Do you have resin installed as a service and are you using the Windows SC command to control it?

06-08-09 19:11

Changed timing to wait for Resin to completely shutdown using BAM/HMTP.

- Issue History
Date Modified Username Field Change
03-25-09 10:46 phcollard New Issue
05-14-09 10:59 iRideSnow Note Added: 0004017
05-14-09 12:22 iRideSnow Note Added: 0004019
05-14-09 12:28 phcollard Note Added: 0004020
05-14-09 12:32 iRideSnow Note Added: 0004021
05-14-09 12:43 iRideSnow Note Added: 0004022
05-14-09 13:10 iRideSnow Note Added: 0004023
06-08-09 19:11 ferg Note Added: 0004068
06-08-09 19:11 ferg Assigned To  => ferg
06-08-09 19:11 ferg Status new => closed
06-08-09 19:11 ferg Resolution open => fixed
06-08-09 19:11 ferg Fixed in Version  => 4.0.1

Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
42 total queries executed.
33 unique queries executed.
Powered by Mantis Bugtracker