Mantis Bugtracker
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0006236 [Resin] minor always 04-30-19 11:10 05-06-19 16:50
Reporter ferg View Status public  
Assigned To
Priority normal Resolution fixed  
Status closed   Product Version
Summary 0006236: dynamic server conflict
Description (rep by Chris Daniel)
 We are in the process of upgrading our production environment. We have upgraded one of our production enviroments from .37 to .61.

A production enviroments includes - 1 file server, 40 app servers, 3 web servers and 6 batch servers, including app and web triads.

Looking through the logs we are having issues after upgrading from .37 to .61

It seems as though our dynamic app servers are having issues connecting back to our triad servers and is breaking our war deployment because they are not unpacking the war on some of the dynamic app servers.

There have been a variety of different logs that we are seeing on the dynamic app servers so Iím not sure if they all pertain to this case or if there are multiple other issues from the logs.

From the dynamic app servers we are getting the following types of log events, the majority of the log events are that there are no active heartbeat from our triad cluster:


{resin-90} java.lang.IllegalStateException: future timeout 30000ms
                       
at com.caucho.bam.proxy.ReplyFutureCallback.get(ReplyFutureCallback.java:106)
                       
at com.caucho.distcache.cluster.CacheMnodeActor.get(CacheMnodeActor.java:209)
                       
at com.caucho.distcache.cluster.ClusterCacheEngine.get(ClusterCacheEngine.java:244)
                       
at com.caucho.distcache.cluster.ClusterCacheEngine.get(ClusterCacheEngine.java:59)
                       
at com.caucho.server.distcache.DistCacheEntry.loadExpiredValue(DistCacheEntry.java:1095)
                       
at com.caucho.server.distcache.DistCacheEntry.reloadValue(DistCacheEntry.java:1077)
                       
at com.caucho.server.distcache.DistCacheEntry.loadMnodeValue(DistCacheEntry.java:1035)
                       
at com.caucho.server.distcache.DistCacheEntry.get(DistCacheEntry.java:910)
                       
at com.caucho.server.distcache.DistCacheEntry.get(DistCacheEntry.java:166)
                       
at com.caucho.server.distcache.CacheImpl.get(CacheImpl.java:317)
                       
at com.caucho.cloud.globalcache.GlobalCacheManager.get(GlobalCacheManager.java:176)
                       
at com.caucho.cloud.globalcache.AbstractGlobalCache.get(AbstractGlobalCache.java:97)
                       
at com.caucho.cloud.elastic.ScalingManager$PeerPod.getScalingPod(ScalingManager.java:439)
                       
at com.caucho.cloud.elastic.ScalingManager.updatePod(ScalingManager.java:283)
                       
at com.caucho.cloud.elastic.ScalingManager$ScalingCacheListener.onCacheChange(ScalingManager.java:464)
                       
at com.caucho.cloud.globalcache.GlobalCacheManager.notifyListeners(GlobalCacheManager.java:278)
                       
at com.caucho.cloud.globalcache.GlobalCacheManager.access$000(GlobalCacheManager.java:60)
                       
at com.caucho.cloud.globalcache.GlobalCacheManager$ClusterCacheListener.onUpdated(GlobalCacheManager.java:444)
                       
at com.caucho.server.distcache.CacheImpl$UpdatedListener.onUpdated(CacheImpl.java:1385)
                       
at com.caucho.server.distcache.CacheImpl.entryUpdate(CacheImpl.java:1142)
                       
at com.caucho.server.distcache.CacheImpl.access$200(CacheImpl.java:86)
                       
at com.caucho.server.distcache.CacheImpl$1.run(CacheImpl.java:1120)
                       
at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173)
                       
at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118)


[19-04-29 14:55:17.357] {resin-1383} HeartbeatHealthCheck[WARNING:no active heartbeat from ClusterServer[id=app-0,192.168.10.100:6800], no active heartbeat from ClusterServer[id=app-1,192.168.10.101:6800], no active heartbeat from ClusterServer[id=app-2,192.168.10.102:6800]]
FAIL: hmtp-Kaa-to-waa ReplyPayload[CacheUpdateWithSource[,value=3c7a4a144375cbf7,flags=6,version=1556567723222,lease=18,source=StreamSource[com.caucho.vfs.TempOutputStream@185f5185,null]]]
FAIL: hmtp-Kaa-to-waa ReplyPayload[CacheUpdateWithSource[,value=ea68574b65080293,flags=6,version=1556567745976,lease=-1,source=StreamSource[com.caucho.vfs.TempOutputStream@605d26bd,null]]]
FAIL: hmtp-Kaa-to-waa ReplyPayload[CacheUpdateWithSource[,value=7bdf1f3ba19acf1b,flags=6,version=1556556741357,lease=-1,source=null]]


On our triad server we are seeing these logs that there are conflicting idís for some of our dynamic app servers:

WarningService: java.lang.RuntimeException: com.caucho.config.ConfigException: 'CloudServer[dyn-app-0-192.168.10.122:6830,12,192.168.10.122:6830]' with id='dyn-app-0-192.168.10.122:6830' does not match new id 'dyn-app-0-192.168.10.121:6830'
[19-04-29 15:03:28.554] {scaling@aaa.app.admin.resin-676} WarningService: java.lang.RuntimeException: com.caucho.config.ConfigException: 'CloudServer[dyn-app-0-192.168.10.122:6830,12,192.168.10.122:6830]' with id='dyn-app-0-192.168.10.122:6830' does not match new id 'dyn-app-0-192.168.10.121:6830'
[19-04-29 15:03:28.554] {scaling@aaa.app.admin.resin-676} java.lang.RuntimeException: com.caucho.config.ConfigException: 'CloudServer[dyn-app-0-192.168.10.122:6830,12,192.168.10.122:6830]' with id='dyn-app-0-192.168.10.122:6830' does not match new id 'dyn-app-0-192.168.10.121:6830í

Additional Information
Attached Files

- Relationships

- Notes
(0006892)
nam
05-06-19 10:09

(rep by Chris Daniel)

This issue only occurs when there is a "large" number of dynamic servers.
 

- Issue History
Date Modified Username Field Change
04-30-19 11:10 ferg New Issue
05-06-19 10:09 nam Note Added: 0006892
05-06-19 16:50 ferg Status new => closed
05-06-19 16:50 ferg Resolution open => fixed
05-06-19 16:50 ferg Fixed in Version  => 4.0.62


Mantis 1.0.0rc3[^]
Copyright © 2000 - 2005 Mantis Group
29 total queries executed.
26 unique queries executed.
Powered by Mantis Bugtracker