Degraded Performance

Incident Report for Iron.io

Postmortem

Overview

On April 6th, at around 12:33 pm PST, we noticed an increase in authentication failures and DNS resolution failures.

What went wrong

We had a dramatic increase of queries being sent to our authentication database which resulted in an automated failover process to start. During the time it took to failover and promote, some API endpoints and our UI were affected.

What we're doing to prevent this from happening again

We're currently adding tests to ensure we're able to handle such authentication traffic increases moving forward. We're also looking at solutions to prevent any disruption of service from happening when database failovers are in progress.

Resolution time

The incident was resolved at approximately 12:49 pm PST.

Posted Apr 06, 2018 - 09:59 PDT

Resolved

Resolved. 16 minutes total time of disruption. Post-mortem following.

Posted Apr 06, 2018 - 09:34 PDT

Monitoring

We have resolved the issue and all systems are operational. We will continue to monitor status.

Posted Apr 05, 2018 - 12:20 PDT

Investigating

We are currently investigating this issue.

Posted Apr 05, 2018 - 11:45 PDT