![]() ![]() MTTD (Mean Time to Detect) - How long did it take for someone to realize there is a problem? The starting point is an event that may not be specifically logged, but inferred from other observations. Latency (response time to user requests) percentiles Transaction throughput per hour/day/week/month/quarter/year The speed to detect and respond to anomalies is a key part of the “Operational Efficiency” pillar of Well-Architected cloud “best practice” implementation and evaluation frameworks by Amazon, Microsoft, and Google.Ī sample Acceptance Criteria statement for work on Chaos Engineering is “confidence in our production deployments” despite the complexity that they represent.Īvailability (unplanned downtime per year/month/week/day/hour).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |