After a deep investigation, the root cause and impact is clear and new processes and alerts are being setup to mitigate this in the future.
During an infrastructure upgrade meant to harden our security posture, some pipelines failed to restart, resulting in data not being sent to the sinks.
This affected less than 5% of customer projects, for between 5-17 minutes. Pipelines in those projects deployed earlier than Monday will have been affected. No data would be lost, and the pipelines continued from where they left off once the issue was resolved.
As such, the incident is downgraded from Major Outage to Partial Outage.
New processes and monitors will be added by the end of the week to prevent these issues.