[ ] When the code is checked in
[ ] When the code is checked in and tests all pass
[ ] When the code is checked in, tested and deployed
[X] When it runs and the KPIs say so
We need to know what our code does when it runs.
The capability to continuously measure business value and code behaviours in production.
items.sort_by { |i| i.name }
items.sort { |a,b| a.name <=> b.name }
def sort_by (&blk) sleep(100) # FIXME super(&blk) end
def sort (&blk) raise Exception.new("Oh Noes!") # :-) end
measure
measure
System performance affects success.
Mind the gap between what customers say they want and what they actually expect
measure
Monitoring is more important than testing.
because knowing the territory is more important that knowing the map.
Once you are able to measure, you should benchmark your solution
nominal latency/throughput under nominal load
aka
put your app on the treadmill and make it run at normal speed.
…make it run fast and secure SLAs.
…make it run faster and faster up until it falls on the treadmill.
Notice the speed at which it happens.
Mitigate the consequences and automate recovery.
…cut one of its legs while it's running and observe.
Again: mitigate the consequences and automate recovery.
You can't own your code if you don't own its metrics.
Nothing can beat the level of control you get from your monitoring.
Examples taken from CodaHale/metrics lib:
metrics.gauge("orders") { orders.size }
val counter = metrics.counter("connection") counter.inc() counter.dec()
val requests = metrics.meter("requests", SECONDS) requests.mark()
val histogram = metrics.histogram("response-size") requests.update(response.completions.size)
val timer = metrics.timer("requests", MILLISECONDS, SECONDS) timer.time { handle(req, resp) }
At ~300 req/sec, our p90 latency jumps from 13ms to 453ms.
Are you a slightly better engineer after this talk?