The system has now almost fully caught up. We're continuing to scan for any residual jobs that may have not run but all should have ran, or be queued up to run shortly.
Thank you for your patience as AWS recovered their core services.
We will be evaluating options of running core iaas outside of AWS.
Posted Feb 28, 2017 - 15:16 PST
Monitoring
Job processing is almost fully up to speed again. It may take awhile to get through the backlog of jobs.
Posted Feb 28, 2017 - 13:54 PST
Update
We are now seeing recovery of IronWorker and working through backlogs of jobs.
Posted Feb 28, 2017 - 13:32 PST
Update
We see jobs going through again... none should be lost but they will be queued up since the issues started this morning.
Posted Feb 28, 2017 - 12:59 PST
Update
Update from AWS. We are quickly trying to restore our services as well:
Update at 12:52 PM PST: We are seeing recovery for S3 object retrievals, listing and deletions. We continue to work on recovery for adding new objects to S3 and expect to start seeing improved error rates within the hour.
Posted Feb 28, 2017 - 12:58 PST
Update
Unfortunately the issue has now cascaded to over 45 AWS services causing unrecoverable issues upstream. At this point, we have to wait on AWS and then begin a fully multi-cloud initiative.
Posted Feb 28, 2017 - 12:48 PST
Update
We are considering bypassing s3 but even then, Docker hub is down and would block any Upstream updating of code packages as they are all built with Docker.