Mantis Bugtracker
  

Viewing Issue Advanced Details Jump to Notes ] View Simple ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0000061 [Resin] minor always 03-30-05 00:00 01-25-06 11:28
Reporter sam View Status public  
Assigned To ferg
Priority normal Resolution fixed Platform
Status closed   OS
Projection none   OS Version
ETA none Fixed in Version 3.0.18 Product Version
  Product Build 3.0.12
Summary 0000061: wrapper.pl kill enhancement
Description RSN-52
(rep by B Bernstein)

We've recently found a small problem with the "stop" part of
wrapper.pl, and I'd like to see if there are any suggestions on how to
deal with it.

We found that a third party component that we were using in a servlet
was freezing up our webapp. Of course we're working on getting that
corrected, but in the meantime, it became impossible to kill the java
resin process while it was in that hung state.

We use the Perl wrapper to launch Resin, so when we sent the "stop"
command, it ran it's normal code which kills the java child process. In
our case, it was timing out after 60 seconds, and then we had the
orphan process sitting around after that. We then needed to manually
"kill -9" that orphan.

What I'd really like to have it do as a fallback (if kill fails) is to
then call:

kill(-9, $child)

Alternatively, have the wrapper return an error if the kill failed, so
that our script can do something about it.

Our script that calls wrapper to stop or restart the process does it's
own checking and waits until the procid goes away, but it only knows
about the wrapper's procid, so when the wrapper fails to kill it's
child, it just exits and never lets any external processes know that it
failed. So, our calling script thinks it was successful.

We're using resin 2.1.x, but I see that the same code is in 3.0.x
versions of the perl wrapper.

Here is something like what I have in mind for that code excerpt (not
tested):

    if ($child > 0) {
        $def_kill_time = $kill_time;

        # let it die gracefully in 60 seconds
        while ($kill_time-- > 0 and kill(0, $child)) {
            sleep(1);
        }

        if ($kill_time <= 0) {
            print("Resin proc $child didn't die, trying kill -$child");
            # timed-out, try again with -$child
            $kill_time = $def_kill_time;
            while ($kill_time-- > 0 and kill(-$child)) {
               sleep(1);
            }
        }

        if ($kill_time <= 0) {
            print("Resin proc $child STILL didn't die, trying kill -9
$child");
            kill(-9, $child);
        }
    }


Steps To Reproduce
Additional Information Unix
Attached Files

- Relationships

- Notes
(0000064)
user98
03-30-05 00:00

It would also be good if the shutdown sequence would block the invoking shell
until the process has died. The current practice for our operations staff is
either:

   A: To type "httpd.sh stop" followed by a lot of "ps f -A | grep java"
      to find out when the JVM has shut down.

   B: Write an elaborate shell script that finds all the child pids of
      wrapper.pl and loop through killing them ountil they're all gone.
 
(0000065)
sam
03-30-05 00:00

(rep by B Bernstein)

I agree, that it would be nice if this happened directly in wrapper.
Currently we use our own elaborate script for controlling resin,
including a loop that watches the resin pid after sending a stop. But
since it's possible for the parent to die and leave orphan children
that script was failing for us in some situations. The pid would die if
the wrapper timed out after 60 seconds.

We would prefer to have our
controller just hang indefinitely so that we know that there's a
problem rather than have it exit neatly after 60 seconds only to find
out that we have orphan processes around eating resources. Of course
we'd really prefer that after the timeout, the wrapper would then try
an increasing level of kill (whatever that would be) followed
eventually by kill -9.
 

- Issue History
Date Modified Username Field Change
03-30-05 00:00 sam New Issue
01-25-06 11:28 ferg Assigned To  => ferg
01-25-06 11:28 ferg Status acknowledged => closed
01-25-06 11:28 ferg Resolution open => fixed
01-25-06 11:28 ferg version 3.0.12 =>
01-25-06 11:28 ferg Fixed in Version  => 3.0.18


Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
32 total queries executed.
29 unique queries executed.
Powered by Mantis Bugtracker