The AWS team has fixed the root cause, and we have confirmed stability in our underlying systems.
As a recap:
From 11:18 AM to 12:46PM, 3% to 5% of requests errored (5xx) or timed out from our Subgraph APIs during intermittent periods lasting up to 3 minutes at a time. This affected a number of specific endpoints, so certain customers may have seen up to 50% of their requests erroring our timing out.
After 12:46PM, under 0.01% of requests sent would be affected. Customer impact was greatly reduced at this time. The number of API requests dropped across our whole system dropped to single digits in any 5 minute window, but would still happen.
By 3:40PM, we no longer saw any errors.
As of 1:50pm PT, only a small fraction (<0.01%) of requests are failing compared to the peak instability (5%).
The team is still working with AWS on additional mitigations to fix all failing requests.
The issue is still occurring intermittently, specifically for Subgraph APIs.
Customers using Mirror to push data to their database or goldsky-hosted cross-chain APIs are unaffected and should not see any downtime.
We've identified the AWS service in question and actively working with AWS to fix the issue.
This issue only affects the query layer on some subgraphs, as well as our dashboard. Indexing was not affected.
AWS is going through a rolling outage, which is resulting in some subgraphs getting intermittent timeouts.
No notices reported this month
Apr 2023 to Jun 2023