Anonymous | Login | Signup for a new account | 01-05-2025 08:53 PST |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Advanced Details [ Jump to Notes ] | [ View Simple ] [ Issue History ] [ Print ] | ||||||||
ID | Category | Severity | Reproducibility | Date Submitted | Last Update | ||||
0005286 | [Resin] | block | random | 11-21-12 02:33 | 01-09-13 12:10 | ||||
Reporter | paul | View Status | public | ||||||
Assigned To | ferg | ||||||||
Priority | normal | Resolution | fixed | Platform | |||||
Status | closed | OS | |||||||
Projection | none | OS Version | |||||||
ETA | none | Fixed in Version | 4.0.32 | Product Version | 3.1.9 | ||||
Product Build | |||||||||
Summary | 0005286: Threads can become 'stuck' forever | ||||||||
Description |
Noticed a few times that the thread count on a server (in a 3 server cluster) can become 'stuck'. Looking through the resin.log reveals the error message [2012-11-09 04:56:41.419] java.io.IOException: failed to add EPOLL for fd=512 [2012-11-09 04:56:41.419] [2012-11-09 04:56:41.419] at com.caucho.server.port.JniSelectManager.removeNative(Native Method) [2012-11-09 04:56:41.419] at com.caucho.server.port.JniSelectManager.remove(JniSelectManager.java:476) [2012-11-09 04:56:41.419] at com.caucho.server.port.JniSelectManager.run(JniSelectManager.java:376) [2012-11-09 04:56:41.419] at java.lang.Thread.run(Thread.java:662) Taking a Threaddump reveals a lot of threads (up to 200 sometimes) all with a stacktrace similar to "hmux-192.168.0.3:6800-20399$1859226681" - Thread t@40 java.lang.Thread.State: RUNNABLE at com.caucho.server.port.JniSelectManager.addNative(Native Method) at com.caucho.server.port.JniSelectManager.keepalive(JniSelectManager.java:229) at com.caucho.server.port.TcpConnection.keepalive(TcpConnection.java:448) at com.caucho.server.port.TcpConnection.run(TcpConnection.java:739) at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:743) at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:662) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None Looking at the native code, guessing that the failure causes the method to exit without first releasing the mutex that it took, and thus leaves any subsequent callers stuck waiting for the same mutex to become available. |
||||||||
Steps To Reproduce | |||||||||
Additional Information | Checked the JNI src for 3.1.9Pro, 3.1.12Pro and 3.1.13Pro and all seem to be the same. | ||||||||
Attached Files | |||||||||
|
There are no notes attached to this issue. |
Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
27 total queries executed. 25 unique queries executed. |