Mantis - Resin
Viewing Issue Advanced Details
6034 minor always 01-31-17 14:30 03-03-17 13:07
ferg  
ferg  
normal  
closed 4.0.49  
fixed  
none    
none 4.0.51  
0006034: issue with database corruption
(rep by Nikola Sepentulevski)
We have noticed resin randomly crashing and sessions dropping.

 

We run resin in a docker container which is clusters between 3 hosts.

 

Attached (console-host-3.private.net.log.13012017.tar) is a stack trace in our logs that we could see, unfortunately we miss out on raw dumps because when resin crashes the docker container is rebuilt from an image. We have syslog configured which sends the logs to a different server. At the time of the issue, memory on that server was at 75%. Attached is a graph which shows the memory usage around the time of the crash (capture.JPG)

 

A couple of hours later we noticed the following stack trace on one of our different hosts console-host-3:

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: CONFIG: [17-01-13 13:14:48.958] {resin-58013} java.lang.IllegalStateException: Table[data:3,/resin/resin-data/console-host-2.private.net/distcache/data.db]:

inode block 0x3827e000 len=8192 has invalid block code (0) expected (2)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.blob.Inode.corrupted(Inode.java:1270)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.blob.Inode.validateBlockAddr(Inode.java:946)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.blob.Inode.remove(Inode.java:865)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.table.BlobColumn.deleteData(BlobColumn.java:221)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.table.Table.delete(Table.java:1163)

Jan 13 13:14:48 console-host-2.private.et dockerd[1559]: at com.caucho.db.table.TableIterator.delete(TableIterator.java:512)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.sql.DeleteQuery.execute(DeleteQuery.java:87)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.jdbc.PreparedStatementImpl.execute(PreparedStatementImpl.java:367)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.jdbc.PreparedStatementImpl.executeUpdate(PreparedStatementImpl.java:337)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.server.distcache.DataStore$DeleteAlarm.deleteOrphans(DataStore.java:731)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.server.distcache.DataStore$DeleteAlarm.handleAlarm(DataStore.java:685)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.util.Alarm.handleAlarm(Alarm.java:523)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.util.Alarm.run(Alarm.java:495)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173)

Jan 13 13:14:48 console-host-2.private.t dockerd[1559]: at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: java.lang.IllegalStateException: Table[data:3,/resin/resin-data/console-host-2.private.netregistry.net/distcache/data.db]: inode block 0x3827e000 len=8192 has invalid block code (0) expected (2)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.blob.Inode.corrupted(Inode.java:1270)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.blob.Inode.validateBlockAddr(Inode.java:946)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.blob.Inode.remove(Inode.java:865)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.table.BlobColumn.deleteData(BlobColumn.java:221)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.table.Table.delete(Table.java:1163)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.table.TableIterator.delete(TableIterator.java:512)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.sql.DeleteQuery.execute(DeleteQuery.java:87)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.jdbc.PreparedStatementImpl.execute(PreparedStatementImpl.java:367)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.db.jdbc.PreparedStatementImpl.executeUpdate(PreparedStatementImpl.java:337)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.server.distcache.DataStore$DeleteAlarm.deleteOrphans(DataStore.java:731)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.server.distcache.DataStore$DeleteAlarm.handleAlarm(DataStore.java:685)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.util.Alarm.handleAlarm(Alarm.java:523)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.util.Alarm.run(Alarm.java:495)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173)

Jan 13 13:14:48 console-host-2.private.net dockerd[1559]: at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118)

 

Stack trace attached as “console-host-2.private.netregistry.net.log.13012017.tar.gz” and screen shot of the memory usage at the time attached as “console-host-2.JPG”

 

We can see this happen on a regular basis. We are running the latest release of Resin.

 

Please let me know if there are further details required.


Notes
(0006739)
ferg   
02-06-17 12:06   
[Details]
        Below is the error output in "jvm-app-0.log" starts from around 20:55 on 1st Feb. 2017.

        ---
        Table[log_data:7,c:\app\resin-pro-4.0.48\resin-data\app-0\log\log_data.db]: inode block len=34000 has non-zero offset 0x726573696e3a7479
        java.lang.IllegalStateException: Table[log_data:7,c:\app\resin-pro-4.0.48\resin-data\app-0\log\log_data.db]: inode block len=34000 has non-zero offset 0x726573696e3a7479
        [17-02-01 20:55:09.777] {resin-3619} WarningService: Shutdown: caucho.database.corrupt
        ---

        After this log output, resin restarted itself.

        We presume it restarted since the corruption of "resin-data" occured.

        Also "resin-data" keeps corrupting on next day and the day after, too.

        ---
        [17-02-02 20:01:09.557] {resin-7573} Table[log_data:7,c:\app\resin-pro-4.0.48\resin-data\app-0\log\log_data.db]: inode block len=48000 has non-zero offset 0x202d205468652064
        [17-02-03 22:07:09.646] {resin-12769} Table[log_data:7,c:\app\resin-pro-4.0.48\resin-data\app-0\log\log_data.db]: inode block len=52000 has non-zero offset 0x66696c65222c0a20