Multiple wallets getting corrupted at once

We are investigating the reason that lead to many Horizen wallets to become not usable. 

The error is beyond from automatic savage. 

Apr 07 17:46:56 chi06.ez-mn.com zencashd[17944]: zencashd: wallet/wallet.cpp:694: void CWallet::IncrementNoteWitnesses(const CBlockIndex*, const CBlock*, ZCIncrementalMerkleTree&): Assertion `(nd->witnessHeight == -1) || (nd->witnessHeight == pindex->nHeight - 1)' failed.
^[[CApr 07 17:46:57 chi06.ez-mn.com systemd[1]: zencash@1698.service: Main process exited, code=dumped, status=6/ABRT
Apr 07 17:46:57 chi06.ez-mn.com systemd[1]: zencash@1698.service: Failed with result 'core-dump'.

As results almost all of our superServers had at least some Horizen nodes that were recovered by our AI. No resources exhaustion was observed, despite the clear synchronous occurrence across all superServers that host an expressive amount of Horizen nodes. The event started about 10:00AM and lead multiples Horizen to get in fail, among those nodes, some got in a fail loop and did not recover properly until now.

Since no resource exhaustion was observed in the past 48h (all servers had free CPU, RAM and DISK to process any operation issued by any daemon from any project), no loses are noticeable in any other hosted project except Horizen.

Currently, we are trying to reload the wallet backups from node deployment. If this approach does not help, we consider to toss the wallet and provide to the node, new wallets with some ZEN to resume the service.

Our SLA will not cover this event as it is related to the project code/network.


Leave a comment