SUMMARY
On March 5, 2025 from 6:44 UTC to 9:45 UTC, Support customers on Pod 25 experienced delayed inbound emails and outbound ticket notifications.
TIMELINE
March 05, 2025 10:06 AM UTC | March 05, 2025 02:06 AM PT
We are happy to report the email issue impacting Pod 25 has been resolved. Outbound emails and notifications have been successfully reprocessed. Affected inbound mail was only temporarily affected and will be reprocessed when the sender’s mail server tries again. Thanks again for patience while we worked through this issue.
March 05, 2025 09:03 AM UTC | March 05, 2025 01:03 AM PT
We have identified the root cause of the email delays on Pod 25 and are working to reprocess the email backlog. New emails are starting to be relayed correctly now, however some stuck emails need further work by our team. Thanks for your patience while we work through this issue. Next update in 1 hour.
March 05, 2025 08:20 AM UTC | March 05, 2025 12:20 AM PT
Our engineers continue to work at the highest priority on the issue impacting email delivery on Pod 25. We don't have a fix ETA at this stage but will continue to update you on our progress. Next update is in 30 mins.
March 05, 2025 07:51 AM UTC | March 04, 2025 11:51 PM PT
We have identified an issue with email processing on Pod 25 that may result in delayed inbound emails and outbound ticket notifications in Support. Inbound email delays may result in delayed ticket creation. We are working on resolving the issue and will provide the next update when we have more to share.
POST-MORTEM
Root Cause Analysis
This incident was caused by a capacity and configuration error related to a new email content and filtering scanner that was being tested. This error led to delays in processing email.
Resolution
To resolve this issue, the team deleted the service tiering of the scanner, which stabilized the service. They also updated the code to allow for longer timeouts on readiness probes and shorter timeouts for scanner requests.
Remediation Items
- Re-tier the scanner service to ensure it is appropriately classified within the service tiers.
- Adjust timeouts for scanners to prevent delays in processing.
- Review and improve dashboards for monitoring service health to ensure timely identification of issues.
- Define inbound Service Level Objectives (SLOs) for SMTP transactions to better manage expectations and performance.
- Broadcast the importance of accurate service tiering within the organization to prevent similar issues in the future.
FOR MORE INFORMATION
For current system status information about Zendesk and specific impacts to your account, visit our system status page. You can follow this article to be notified when our post-mortem report is published. If you have additional questions about this incident, contact Zendesk customer support.
0 comments
Article is closed for comments.