Mantis - Resin
|
|||||
Viewing Issue Advanced Details | |||||
|
|||||
ID: | Category: | Severity: | Reproducibility: | Date Submitted: | Last Update: |
2558 | minor | always | 03-27-08 08:14 | 06-12-08 09:35 | |
|
|||||
Reporter: | ferg | Platform: | |||
Assigned To: | ferg | OS: | |||
Priority: | normal | OS Version: | |||
Status: | closed | Product Version: | 3.1.3 | ||
Product Build: | Resolution: | no change required | |||
Projection: | none | ||||
ETA: | none | Fixed in Version: | |||
|
|||||
Summary: | 0002558: cluster store corruption issue | ||||
Description: |
(rep by Andrew Fritz) Both of the servers in our cluster stopped responding at the same time and java started using 100% of all CPU resources. Upon killing one server, the other began responding almost immediately. Restarting the dead server resulting in MANY exceptions (all roughly the same): [09:12:53.567] java.lang.IllegalStateException: Can't yet support data over 64M [09:12:53.567] at com.caucho.db.store.Inode.readFragmentAddr(Inode.java:972) [09:12:53.567] at com.caucho.db.store.Inode.remove(Inode.java:832) [09:12:53.567] at com.caucho.db.store.Transaction.writeData(Transaction.java:568) [09:12:53.567] at com.caucho.db.sql.QueryContext.unlock(QueryContext.java:517) [09:12:53.567] at com.caucho.db.sql.RowIterateExpr.nextBlock(RowIterateExpr.java:86) [09:12:53.567] at com.caucho.db.sql.Query.nextBlock(Query.java:713) [09:12:53.567] at com.caucho.db.sql.Query.nextTuple(Query.java:690) [09:12:53.567] at com.caucho.db.sql.DeleteQuery.execute(DeleteQuery.java:81) [09:12:53.567] at com.caucho.db.jdbc.PreparedStatementImpl.execute(PreparedStatementImpl.java:345) [09:12:53.567] at com.caucho.db.jdbc.PreparedStatementImpl.executeUpdate(PreparedStatementImpl.java:313) [09:12:53.567] at com.caucho.server.cluster.FileBacking.clearOldObjects(FileBacking.java:260) [09:12:53.567] at com.caucho.server.cluster.ClusterStore.clearOldObjects(ClusterStore.java:358) [09:12:53.567] at com.caucho.server.cluster.StoreManager.handleAlarm(StoreManager.java:637) [09:12:53.567] at com.caucho.server.cluster.StoreManager.start(StoreManager.java:386) [09:12:53.567] at com.caucho.server.cluster.ClusterStore.start(ClusterStore.java:196) [09:12:53.567] at com.caucho.server.cluster.Cluster.environmentStart(Cluster.java:928) [09:12:53.567] at com.caucho.loader.EnvironmentClassLoader.start(EnvironmentClassLoader.java:475) [09:12:53.567] at com.caucho.server.cluster.Server.start(Server.java:1149) [09:12:53.567] at com.caucho.server.cluster.Cluster.startServer(Cluster.java:719) [09:12:53.567] at com.caucho.server.cluster.ClusterServer.startServer(ClusterServer.java:455) [09:12:53.567] at com.caucho.server.resin.Resin.start(Resin.java:694) [09:12:53.567] at com.caucho.server.resin.Resin.initMain(Resin.java:1114) [09:12:53.567] at com.caucho.server.resin.Resin.main(Resin.java:1316) This exception appeared many time, but everything appears to be working again. I found one reference related to this being cluster store corruption possibly related to locking issues. Since our fence came down (allowing public access, vs beta group only access) spiders have been hitting our site pretty hard which could result in a lot more lock contention (several 1000 hits on a server in rapid succession). Not sure if this might be related. Any idea what the root cause of this hang up was, or what I can do to prevent it in the future? |
||||
Steps To Reproduce: | |||||
Additional Information: | |||||
Relationships | |||||
Attached Files: |
Notes | |||||
|
|||||
|
|