Tuesday, February 3, 2015

Learn the Rules so you can figure out who's breaking them

99.99 percent [of subatomic interactions] are explainable ... Spending your time exploring each particle trail will lead you to conclude that all the particles obey known physics, and there's nothing left to discover ... Most stars are boring; advances comes from studying the weirdies - the quasars, the pulsars, the gravitational lenses - that don't seem to fit into the models that you've grown up with ... Collect raw data and throw away the expected. What remains challenges your theories.
- Clifford Stoll, in The Cuckoo's Egg
The Cuckoo's Egg is an amazing book - it's a memoir, a techno-thriller, and a manifesto all in one. It recounts how, in the mid-1980s, an astronomer-turned-sysadmin at UC Berkley stumbled across a hacker in their system, and upon tracking down the intruder, ended up catching a ring of KGB agents in their first real stabs at cyber-espionage! It's a must-read for anyone who's ever going to write code, and just a darn good read for everyone else.

The quote above (from chapter 3) is a very insightful one because it tells us, in essence, that the really useful details are in the edge cases. Stoll's first hint that something was up was a $0.75 discrepancy in an accounting system. No one suspected foul-play, and in most scenarios, the bean-counters would just write this off and be done with it. But Stoll and his co-workers decided to dig, and found a rabbit hole so deep it literally did go all the way through to the other side of the world.

It's a principle that technology professionals need to keep in mind at all times. Yes, you will encounter that case eventually. The numbers will get that big. That reference will find some way to be null. There will be someone with 25 dependents. It's not enough to make sure it works for the most typical cases. Of course that's where to start coding, and where to focus the majority of the testing. But you can't ignore the weirdo edge cases - finding and understanding them can expose hidden problems, as well as opportunities. A lot of revenue can be lost by systems that are leaky around the edges, and a lot of unrealized revenue can be tapped by looking in the same places. Don't assume there aren't any outliers, because there always are. Find them, and either bring them in line, or exploit the opportunity they present you!

And the thing about outliers is that they can always pop back up. You think you've fixed that bug, only to see some more examples of it weeks after the release. You write off a 'harmless' variation each month, only to realize it added up to a pretty big loss by year's end. Again, always assume there are outliers, and put checks and balances in your systems to find them. Sometimes this mean making sure your test suite is robust enough; other times it means having an independent audit system.

This is especially important these days. I'm not suggesting that most weird events can be explained by a hacker, but you never know. Anyone who reads the news knows you can't be too careful. Don't assume your security is good enough! If something seems suspicious, figure it out! Outliers can just be bugs, but they can also be attacks.

Very few people are actually lazy, but we can all be lulled into a false sense of security. Just because the fire alarm isn't going off doesn't mean there aren't any fire hazards. We don't need to be paranoid, or rule-bound, but we do need to be vigilant and thorough. How one maintains such an attitude is probably the subject for a whole book and in the end is different for everyone, but it's a skill that technology professionals must develop.