SUMMARY
On March 11, 2025 from 5:50 UTC to 23:18 UTC, Talk customers across all Pods experienced elevated rates of dropped calls.
TIMELINE
March 11, 2025 11:26 PM UTC | March 11, 2025 04:26 PM PT
We are happy to report that the intermittent dropped call issue in Talk is now resolved. Thanks for your patience while we worked with our provider on this issue.
March 11, 2025 03:22 PM UTC | March 11, 2025 08:22 AM PT
We are monitoring the situation with our telephony partner and we are unfortunately seeing intermittent spikes in call drops. We apologize for the inconvenience. Please be assured we will continue to monitor the situation and update as we know more.
March 11, 2025 09:23 AM UTC | March 11, 2025 02:23 AM PT
We are pleased to inform you that our Talk partner has confirmed that the issues with dropped calls have been fully resolved as of 08:46 UTC.
March 11, 2025 07:32 AM UTC | March 11, 2025 12:32 AM PT
Our Talk partner has now resolved the issue that could cause dropped calls for our customers. We will continue to monitor the situation and provide you with further updates as they become available.
March 11, 2025 06:59 AM UTC | March 10, 2025 11:59 PM PT
Our service provider has reported elevated dropped call rates in Talk across all pods. They have implemented a fix and are monitoring for full recovery. We will provide the next update when we have more to share.
POST-MORTEM
Root Cause Analysis
This incident was caused by an out-of-memory issue in an upstream service, which triggered a cascading failure. The initial brief timeouts escalated to significant errors, prompting an investigation by our Talk partner’s engineers.
Resolution
To address this issue, our Talk partner’s engineers implemented remediation measures, including replacing the affected hosts and rolling back beta features that relied on the problematic upstream service. These actions helped restore normal service and mitigate the impact on customers.
Remediation Items
- Enhance monitoring and alerting systems to detect upstream service issues more effectively.
- Improve the resilience of the platform to prevent cascading failures in the future.
- Conduct a thorough review of dependencies on upstream services to identify potential risks.
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.
0 comments
Article is closed for comments.