Subgraph APIs Degraded

Updates

Resolved
June 14, 2023 at 10:40 PM
Resolved
June 14, 2023 at 10:40 PM
The AWS team has fixed the root cause, and we have confirmed stability in our underlying systems.

As a recap:

From 11:18 AM to 12:46PM, 3% to 5% of requests errored (5xx) or timed out from our Subgraph APIs during intermittent periods lasting up to 3 minutes at a time. This affected a number of specific endpoints, so certain customers may have seen up to 50% of their requests erroring our timing out.

After 12:46PM, under 0.01% of requests sent would be affected. Customer impact was greatly reduced at this time. The number of API requests dropped across our whole system dropped to single digits in any 5 minute window, but would still happen.

By 3:40PM, we no longer saw any errors.
Monitoring
June 14, 2023 at 10:07 PM
Monitoring
June 14, 2023 at 10:07 PM
As of 1:50pm PT, only a small fraction (<0.01%) of requests are failing compared to the peak instability (5%).

The team is still working with AWS on additional mitigations to fix all failing requests.
Update
June 14, 2023 at 8:09 PM
Update
June 14, 2023 at 8:09 PM
The issue is still occurring intermittently, specifically for Subgraph APIs.

Customers using Mirror to push data to their database or goldsky-hosted cross-chain APIs are unaffected and should not see any downtime.
Identified
June 14, 2023 at 7:51 PM
Identified
June 14, 2023 at 7:51 PM
We've identified the AWS service in question and actively working with AWS to fix the issue.

This issue only affects the query layer on some subgraphs, as well as our dashboard. Indexing was not affected.
Investigating
June 14, 2023 at 6:58 PM
Investigating
June 14, 2023 at 6:58 PM
AWS is going through a rolling outage, which is resulting in some subgraphs getting intermittent timeouts.

Goldsky - Subgraph APIs Degraded – Incident details

All systems operational