Updated
SUMMARY
Earlier today, we performed an update to our deployment model, which involved migrating the services responsible for Autocomplete, Search, and the Shopping Assistant to a new target group in our load balancer. The migration was validated and confirmed to be working as expected.
At 21:30 CET, a scheduled internal component that is part of our circuit breaking subsystem, executed its routine adjustment of request rate limits based on current usage. This component had a dependency on the previous, hardcoded load balancer configuration and was not updated to reflect the migration. As a result, it began reporting inaccurate request rate values.
The corrupted rate data caused the circuit breaker to enter an inconsistent state, triggering it to open the circuit and reject connections for a subset of clients. This manifested as elevated error rates on endpoints communicating with the search cluster.
Following investigation, we identified the root cause, and full service was restored for all affected clients at 22:26 CET.
Resolved
The issue has now been resolved and the error rates are back to nominal. The issue has been caused by a fault in the internal circuit breaker component. A detailed internal postmortem will follow.
Created
We are investigating reports of increased error rates since 21:30 CEST.
Resolved