On April 6th, at around 12:33 pm PST, we noticed an increase in authentication failures and DNS resolution failures.
What went wrong
We had a dramatic increase of queries being sent to our authentication database which resulted in an automated failover process to start. During the time it took to failover and promote, some API endpoints and our UI were affected.
What we're doing to prevent this from happening again
We're currently adding tests to ensure we're able to handle such authentication traffic increases moving forward. We're also looking at solutions to prevent any disruption of service from happening when database failovers are in progress.
The incident was resolved at approximately 12:49 pm PST.