The Samsung 980 NVMe pcie 3 SSD (plain, no pro as it has TLC chipset) started to show critical medium errors.
The problem is that with SSD's, when this starts showing up, you never known when the fatal error will actually hit and stop the system.
As I always have a replacement drive for each drive in the server available, it took me 5 minutes to replace the drive one the still accessible data was migrated, then 1 hour to go through the backups to restore the not accessible data, and the server was back online for regular operations.
To contact Samsung support for the SSD disks was another matter. I had to go through 3 different support sites until I found the right one. But then, it was quite fast. RMA was setup, and the drive picked up and replaced within 5 days.
So I again have a spare drive here in case it is needed.
Remember to always have a monitoring system in place that you actually check, to be warned and give you time to react!
|