IronWorker Degraded Performance

Incident Report for Iron.io

Postmortem

Overview

On May 13th, at 03:29 UTC, we began routine database upgrades. During the upgrade process we noticed errors in our logs indicating certain queries weren’t able to complete successfully.

What went wrong

After investigating into the errors thrown, we found data anomalies in our Production data set that didn’t exist in our Staging data set. This difference resulted in slow queries and errors that cascaded into service interruptions for a subset of our customers.

What we're doing to prevent this from happening again

Moving forward we’re taking steps to ensure our Staging data set is 100% up to date with our Production data set. If the copies of the data were exact, this would have been caught in Staging and wouldn’t have caused a disruption in service.

Resolution time

The incident was resolved at 11:49 UTC

Posted May 14, 2019 - 16:51 PDT

Resolved

The migration has completed and service has returned to normal.

Posted May 14, 2019 - 16:50 PDT

Update

Migration is still in progress. This is taking more time than expected but we're monitoring it closely.

Posted May 14, 2019 - 15:20 PDT

Update

We are continuing to work on a fix for this issue.

Posted May 14, 2019 - 14:12 PDT

Identified

Due to a database upgrade issue, a portion of our IronWorker customers are experiencing issues with certain API commands. We've identified the issue and are in the process of resolving.

Posted May 14, 2019 - 14:10 PDT

This incident affected: IronWorker Dedicated and IronWorker Public.