System Outage

Major incident Front-end Portal API Rest API HTTP API
2021-10-27 04:11 BST · 5 hours, 14 minutes

Updates

Post-mortem

Summary

At 04:11 BST the VOODOO platform started to experience issues processing SMS, this was identified by the monitoring platforms. On initial investigation, the database cluster was struggling to process reads and write requests in a timely manner.

Timeline of Events

27/10 - 04:11 BST - Monitoring platform detected an increase in the process queue which triggered alerts to the VOODOO Technical Team
27/10 - 04:15 BST - VOODOO and UKFast engineers start investigating issues related to the alerts
27/10 - 05:15 BST - Senior UKFast OnCall Engineers brought online, attempts to restart each node within cluster
27/10 - 06:10 BST - Decision made to start a full cluster restart
27/10 - 07:40 BST - Two nodes online within the cluster. Data integrity checks started
27/10 - 08:35 BST - Data integrity checks completed, snapshot backup initiated, started bringing services online and monitoring performance.
27/10 - 09:21 BST - All services online, continuous monitoring of the services by engineers for the remainder of the day
27/10 - 10:00 BST - VOODOO and UKFast started investigating the events leading up to the initial alerts
27/10 - 14:00 BST - Changes made to the nightly scripts to increase the period between scripts running

Further Actions

VOODOO is progressing with a detailed investigation into the events leading up to the initial alerts triggered at 04:11 BST, along with reviewing the events and actions after the alerts. From this, any further remedial actions  identified will be implemented in order of priority and a root cause analysis will be completed.

Should you have any questions on the above, please do not hesitate to contact your account manager or open a ticket via support@voodoosms.com .

November 4, 2021 · 14:53 BST
Issue

We’re experiencing an elevated level of errors and are currently looking into the issue.

November 4, 2021 · 14:53 BST

← Back