Mantis - Resin
Viewing Issue Advanced Details
6236 minor always 04-30-19 11:10 05-06-19 16:50
ferg  
 
normal  
closed  
fixed  
none    
none 4.0.62  
0006236: dynamic server conflict
(rep by Chris Daniel)
 We are in the process of upgrading our production environment. We have upgraded one of our production enviroments from .37 to .61.

A production enviroments includes - 1 file server, 40 app servers, 3 web servers and 6 batch servers, including app and web triads.

Looking through the logs we are having issues after upgrading from .37 to .61

It seems as though our dynamic app servers are having issues connecting back to our triad servers and is breaking our war deployment because they are not unpacking the war on some of the dynamic app servers.

There have been a variety of different logs that we are seeing on the dynamic app servers so I’m not sure if they all pertain to this case or if there are multiple other issues from the logs.

From the dynamic app servers we are getting the following types of log events, the majority of the log events are that there are no active heartbeat from our triad cluster:


{resin-90} java.lang.IllegalStateException: future timeout 30000ms
                       
at com.caucho.bam.proxy.ReplyFutureCallback.get(ReplyFutureCallback.java:106)
                       
at com.caucho.distcache.cluster.CacheMnodeActor.get(CacheMnodeActor.java:209)
                       
at com.caucho.distcache.cluster.ClusterCacheEngine.get(ClusterCacheEngine.java:244)
                       
at com.caucho.distcache.cluster.ClusterCacheEngine.get(ClusterCacheEngine.java:59)
                       
at com.caucho.server.distcache.DistCacheEntry.loadExpiredValue(DistCacheEntry.java:1095)
                       
at com.caucho.server.distcache.DistCacheEntry.reloadValue(DistCacheEntry.java:1077)
                       
at com.caucho.server.distcache.DistCacheEntry.loadMnodeValue(DistCacheEntry.java:1035)
                       
at com.caucho.server.distcache.DistCacheEntry.get(DistCacheEntry.java:910)
                       
at com.caucho.server.distcache.DistCacheEntry.get(DistCacheEntry.java:166)
                       
at com.caucho.server.distcache.CacheImpl.get(CacheImpl.java:317)
                       
at com.caucho.cloud.globalcache.GlobalCacheManager.get(GlobalCacheManager.java:176)
                       
at com.caucho.cloud.globalcache.AbstractGlobalCache.get(AbstractGlobalCache.java:97)
                       
at com.caucho.cloud.elastic.ScalingManager$PeerPod.getScalingPod(ScalingManager.java:439)
                       
at com.caucho.cloud.elastic.ScalingManager.updatePod(ScalingManager.java:283)
                       
at com.caucho.cloud.elastic.ScalingManager$ScalingCacheListener.onCacheChange(ScalingManager.java:464)
                       
at com.caucho.cloud.globalcache.GlobalCacheManager.notifyListeners(GlobalCacheManager.java:278)
                       
at com.caucho.cloud.globalcache.GlobalCacheManager.access$000(GlobalCacheManager.java:60)
                       
at com.caucho.cloud.globalcache.GlobalCacheManager$ClusterCacheListener.onUpdated(GlobalCacheManager.java:444)
                       
at com.caucho.server.distcache.CacheImpl$UpdatedListener.onUpdated(CacheImpl.java:1385)
                       
at com.caucho.server.distcache.CacheImpl.entryUpdate(CacheImpl.java:1142)
                       
at com.caucho.server.distcache.CacheImpl.access$200(CacheImpl.java:86)
                       
at com.caucho.server.distcache.CacheImpl$1.run(CacheImpl.java:1120)
                       
at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173)
                       
at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118)


[19-04-29 14:55:17.357] {resin-1383} HeartbeatHealthCheck[WARNING:no active heartbeat from ClusterServer[id=app-0,192.168.10.100:6800], no active heartbeat from ClusterServer[id=app-1,192.168.10.101:6800], no active heartbeat from ClusterServer[id=app-2,192.168.10.102:6800]]
FAIL: hmtp-Kaa-to-waa ReplyPayload[CacheUpdateWithSource[,value=3c7a4a144375cbf7,flags=6,version=1556567723222,lease=18,source=StreamSource[com.caucho.vfs.TempOutputStream@185f5185,null]]]
FAIL: hmtp-Kaa-to-waa ReplyPayload[CacheUpdateWithSource[,value=ea68574b65080293,flags=6,version=1556567745976,lease=-1,source=StreamSource[com.caucho.vfs.TempOutputStream@605d26bd,null]]]
FAIL: hmtp-Kaa-to-waa ReplyPayload[CacheUpdateWithSource[,value=7bdf1f3ba19acf1b,flags=6,version=1556556741357,lease=-1,source=null]]


On our triad server we are seeing these logs that there are conflicting id’s for some of our dynamic app servers:

WarningService: java.lang.RuntimeException: com.caucho.config.ConfigException: 'CloudServer[dyn-app-0-192.168.10.122:6830,12,192.168.10.122:6830]' with id='dyn-app-0-192.168.10.122:6830' does not match new id 'dyn-app-0-192.168.10.121:6830'
[19-04-29 15:03:28.554] {scaling@aaa.app.admin.resin-676} WarningService: java.lang.RuntimeException: com.caucho.config.ConfigException: 'CloudServer[dyn-app-0-192.168.10.122:6830,12,192.168.10.122:6830]' with id='dyn-app-0-192.168.10.122:6830' does not match new id 'dyn-app-0-192.168.10.121:6830'
[19-04-29 15:03:28.554] {scaling@aaa.app.admin.resin-676} java.lang.RuntimeException: com.caucho.config.ConfigException: 'CloudServer[dyn-app-0-192.168.10.122:6830,12,192.168.10.122:6830]' with id='dyn-app-0-192.168.10.122:6830' does not match new id 'dyn-app-0-192.168.10.121:6830’


Notes
(0006892)
nam   
05-06-19 10:09   
(rep by Chris Daniel)

This issue only occurs when there is a "large" number of dynamic servers.