AMS-SSDVZ5 RAID Failure

Downtime

AMS-SSDVZ5 RAID Failure

Dec 27 at 08:30am GMT

Affected services

Amsterdam (NL)

Resolved
Dec 29 at 10:29pm GMT

We continue to process all requests same day and have had positive feedback from clients who've been able to recover from the backups we provided.

Please ensure all requests are made by 7th Jan 2022. After that we will destroy all data held from the previous node.

Updated
Dec 29 at 12:02am GMT

We have been working all evening ensuring clients get a copy of the available data quickly and have completed 95%+ of the requests that came through. Some clients have posted positive feedback on state of the copy and were able to recover important data.

Updated
Dec 28 at 05:11pm GMT

Unfortunately, the restore attempts have been unsuccessful for your affected VPS(s) on the AMS-SSDVZ-5 node.

What you should do now:
- We have set up a new node and you may reinstall with a new OS from the control panel now, your VM will show offline in the client area and you must reinstall. Attempting to boot will not work.

Partial (corrupt) data is available:
- The data we were able to retrieve is partially corrupted and so VMs are not successfully booting, our test VMs from the node have partial data and we believe most (but not all) clients could possibly retrieve some data that we were able to copy over. To request a copy of this (if we have it available), please open a ticket. Please be patient as we go through the requests as we will need to mount & compress the VM HDDs.

It is very rare and unfortunate that this has occured and we have not previously had such fatal failure ever. This is a friendly reminder that you must retain your own backups at all times, if you are hosting critical data it is well advised to have multiple off site recovery points. Majority of the clients on this node have been with us for several years and we have emailed on numerous occasions (eg. OpenVZ. 6>7 and HDD>SSD migrations) that you must retain your own and that we do not take backups.

We thank you for being patient and apologise sincerely for the inconvenience this may have caused.

Updated
Dec 28 at 03:59pm GMT

We have began migrating data we recovered yesterday to the new node, we will then attempt to restart the VMs.

Updated
Dec 28 at 02:42pm GMT

The failed drives have been replaced and we are now reinstalling the node.

Updated
Dec 27 at 09:14pm GMT

We have been able to access some data and recover approx. 2TB. We are unsure of the state of the ploop hdd of your VM until we begin restoring tomorrow evening.

As a result we have paused creating new VMs, we hope that most data is in recoverable state to bring your VM back online. If you have your own backups and would like to get back online ASAP, then please simply submit a ticket to have a new VM created with the same IPs.

Once again we apologise for any inconvenience this may have caused and thank you for your patience.

Updated
Dec 27 at 02:18pm GMT

We are attempting to recover data from the array, we may be successful. We have paused creating new VMs.

If you have your own backups and need to be online ASAP then please submit ticket so we can create new VM (same IP).

Updated
Dec 27 at 10:31am GMT

We have began creating new VMs and emailing clients new details, we believe the data is not recoverable. Should this change tomorrow, we will inform clients immediately however you should begin plans to restore from your own available backups. At the same time we will also be extending all affected VMs due dates by +1 month.

This is our first major incident in several years, we apologise for any inconvenience this may have caused.

Created
Dec 27 at 08:30am GMT

We are currently investigating two drive failure on the same RAID10 span and we believe it will mean the data is lost.

Unfortunately due to COVID restrictions and holidays, our datacentre has informed us no engineer is available to investigate on site and replace the failed drives till tomorrow.

This is the only node down in NL.