Mantis - Resin
Viewing Issue Advanced Details
6088 minor always 08-22-17 08:38 09-06-17 16:26
wileysaw  
ferg  
normal  
closed 4.0.49  
fixed  
none    
none 4.0.54  
0006088: Dynamic server not recovering after a stop in a cluster
[Environment]

Resin Version : 4.0.49

We have a clustered environment which consist of 4 servers(triad and
dynamic).

APserver1(app-0)
APserver2(app-1)
APserver3(app-2)
APserver4(dyn-0)

[Issue]

After 15 minutes have elapsed after stopping APserver4 (dyn-0), the
following message is output.

[17-08-18 11:21:24.652] {resin-105} ScalingManager[] removing dynamic
server TriadServer[dyn-app-0-172.26.210.4:6830,1,172.26.210.4:6830]

APserver4(dyn-0) does not become Active in resin-admin even if APserver4
(dyn-0) is started after outputting this message. (It will remain FAIL)
Also, the following error is output when APserver4(dyn-0) is started.

{{{
[17-08-18 11:24:16.814] {resin-45} com.caucho.bam.
InternalServerException: BamError[type=cancel,group=internal-server-
error,text=com.caucho.bam.TimeoutException: QueryItem[5,
FirstMethodCallback[SimpleActorSender[QueryActorFilter
[SkeletonActorFilter[cluster-router-app-main@baa.app.admin.resin,com.
caucho.cloud.bam.ClusterRouteActor]]],cluster-router-app-main@baa.app.
admin.resin,CallPayload[localGet]]]]
  at com.caucho.bam.BamError.createException(BamError.java:430)
  at com.caucho.bam.proxy.ReplyFutureCallback.get(ReplyFutureCallback.
java:104)
  at com.caucho.distcache.cluster.CacheMnodeActor.get(CacheMnodeActor.
java:209)
  at com.caucho.distcache.cluster.ClusterCacheEngine.get
(ClusterCacheEngine.java:244)
  at com.caucho.distcache.cluster.ClusterCacheEngine.get
(ClusterCacheEngine.java:59)
  at com.caucho.server.distcache.DistCacheEntry.loadExpiredValue
(DistCacheEntry.java:1095)
  at com.caucho.server.distcache.DistCacheEntry.reloadValue
(DistCacheEntry.java:1077)
  at com.caucho.server.distcache.DistCacheEntry.loadMnodeValue
(DistCacheEntry.java:1035)
  at com.caucho.server.distcache.DistCacheEntry.get(DistCacheEntry.java:
910)
  at com.caucho.server.distcache.DistCacheEntry.get(DistCacheEntry.java:
166)
  at com.caucho.server.distcache.CacheImpl.get(CacheImpl.java:318)
  at com.caucho.cloud.globalcache.GlobalCacheManager.get
(GlobalCacheManager.java:172)
  at com.caucho.cloud.globalcache.AbstractGlobalCache.get
(AbstractGlobalCache.java:97)
  at com.caucho.cloud.elastic.ScalingManager$PeerPod.getScalingPod
(ScalingManager.java:432)
  at com.caucho.cloud.elastic.ScalingManager.updatePod(ScalingManager.
java:282)
  at com.caucho.cloud.elastic.ScalingAlarm.handleAlarm(ScalingAlarm.
java:57)
  at com.caucho.util.Alarm.handleAlarm(Alarm.java:523)
  at com.caucho.util.Alarm.run(Alarm.java:495)
  at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173)
  at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118)
}}}

Notes
(0006788)
ferg   
09-06-17 16:26   
cloud/127a