Anonymous | Login | Signup for a new account | 04-19-2024 00:01 PDT |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ View Advanced ] [ Issue History ] [ Print ] | ||||||||
ID | Category | Severity | Reproducibility | Date Submitted | Last Update | ||||
0002558 | [Resin] | minor | always | 03-27-08 08:14 | 06-12-08 09:35 | ||||
Reporter | ferg | View Status | public | ||||||
Assigned To | ferg | ||||||||
Priority | normal | Resolution | no change required | ||||||
Status | closed | Product Version | 3.1.3 | ||||||
Summary | 0002558: cluster store corruption issue | ||||||||
Description |
(rep by Andrew Fritz) Both of the servers in our cluster stopped responding at the same time and java started using 100% of all CPU resources. Upon killing one server, the other began responding almost immediately. Restarting the dead server resulting in MANY exceptions (all roughly the same): [09:12:53.567] java.lang.IllegalStateException: Can't yet support data over 64M [09:12:53.567] at com.caucho.db.store.Inode.readFragmentAddr(Inode.java:972) [09:12:53.567] at com.caucho.db.store.Inode.remove(Inode.java:832) [09:12:53.567] at com.caucho.db.store.Transaction.writeData(Transaction.java:568) [09:12:53.567] at com.caucho.db.sql.QueryContext.unlock(QueryContext.java:517) [09:12:53.567] at com.caucho.db.sql.RowIterateExpr.nextBlock(RowIterateExpr.java:86) [09:12:53.567] at com.caucho.db.sql.Query.nextBlock(Query.java:713) [09:12:53.567] at com.caucho.db.sql.Query.nextTuple(Query.java:690) [09:12:53.567] at com.caucho.db.sql.DeleteQuery.execute(DeleteQuery.java:81) [09:12:53.567] at com.caucho.db.jdbc.PreparedStatementImpl.execute(PreparedStatementImpl.java:345) [09:12:53.567] at com.caucho.db.jdbc.PreparedStatementImpl.executeUpdate(PreparedStatementImpl.java:313) [09:12:53.567] at com.caucho.server.cluster.FileBacking.clearOldObjects(FileBacking.java:260) [09:12:53.567] at com.caucho.server.cluster.ClusterStore.clearOldObjects(ClusterStore.java:358) [09:12:53.567] at com.caucho.server.cluster.StoreManager.handleAlarm(StoreManager.java:637) [09:12:53.567] at com.caucho.server.cluster.StoreManager.start(StoreManager.java:386) [09:12:53.567] at com.caucho.server.cluster.ClusterStore.start(ClusterStore.java:196) [09:12:53.567] at com.caucho.server.cluster.Cluster.environmentStart(Cluster.java:928) [09:12:53.567] at com.caucho.loader.EnvironmentClassLoader.start(EnvironmentClassLoader.java:475) [09:12:53.567] at com.caucho.server.cluster.Server.start(Server.java:1149) [09:12:53.567] at com.caucho.server.cluster.Cluster.startServer(Cluster.java:719) [09:12:53.567] at com.caucho.server.cluster.ClusterServer.startServer(ClusterServer.java:455) [09:12:53.567] at com.caucho.server.resin.Resin.start(Resin.java:694) [09:12:53.567] at com.caucho.server.resin.Resin.initMain(Resin.java:1114) [09:12:53.567] at com.caucho.server.resin.Resin.main(Resin.java:1316) This exception appeared many time, but everything appears to be working again. I found one reference related to this being cluster store corruption possibly related to locking issues. Since our fence came down (allowing public access, vs beta group only access) spiders have been hitting our site pretty hard which could result in a lot more lock contention (several 1000 hits on a server in rapid succession). Not sure if this might be related. Any idea what the root cause of this hang up was, or what I can do to prevent it in the future? |
||||||||
Additional Information | |||||||||
Attached Files | |||||||||
|
Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
28 total queries executed. 25 unique queries executed. |