As I was flying back from Japan recently coming in on final approach into San Francisco, you could see the Blue Angels leaving trails of smoke and practicing for their upcoming air show as part of Fleet Week 2016, an annual San Francisco tradition. It is rather wonderful that such a historically liberal city puts out the welcome mat each year in a big way to celebrate the sailors and Marines who work each day to defend the ability for San Francisco to be so progressive.
Watching them warm up, do flybys past the ever-growing Salesforce Tower and the iconic Golden Gate Bridge got me wondering how they prepare and train for such a precision display. It turns out, coincidentally, there is a wonderful documentary on the Smithsonian Channel and another on YouTube. Watching it you get a sense of the level of routine that is required to achieve this consistency and precision. Before each performance, they do a site inspection and practice fly-bys to get a feel for the space and determine the specifics of the air show, even if it is the same as the year before and the year before that.
The day of the show they gather in a room and do a mock walk-through with each member describing out loud exactly what they are doing and the timing for each maneuver – this mentally creates “muscle memory” to avoid human error. After each event, they have a candid and open critique session – in this session each person has to openly admit the mistakes they made and if they own them there is no issue. If they try to duck their errors they get called out rather forcefully, no one is exempt, even the CO can get called out.
This process of continuous improvement and monitoring is something we can learn from and apply to data center network monitoring and data center operations. How often do we test and validate our assumptions? How frequently do we make certain that our networks are configured and operating as we intended? A key goal of a network is to create a structure that is resilient to failures of links, nodes, and hopefully even humans – how often do we re-test to validate these design principles are being achieved?
Mature software development organizations have adopted these principles — they have structured development workflows where test plans are automated and re-run constantly against the mainline codebase. Each check-in and change is validated before being synced to the top-of-tree. Developers encode their testing and review best practices into automated tooling to reduce the manual effort that results in human error-prone mistakes.
At Apstra, we are working to build the same levels of capability that have resulted in highly reliable software, systems with significantly reduced defects, and workflows that deliver and reinforce best practices into the infrastructure itself. As companies talk about ‘infrastructure as code’ shouldn’t we apply the same best practices that have resulted in reliable and secure software?
Lastly, as the Blue Angels soar over the city by the bay this weekend, think about what it takes to make that happen: It’s tremendously skilled network operators who hone their skills daily and test themselves on each flight with the utmost of scrutiny from six of their peers who are equally talented. But it is also every person behind them you don’t see as often: the mechanics, the crew chiefs, the maintenance staff, the logisticians, the air traffic teams, the flight planners.
Information Technology also takes a wide variety of roles and skills including Network Operators, Engineers, and Managers, Software Engineers, and CIOs. As with the Blue Angels, the failure of any one can bring everything else in the stack down. So while cheering the pilots and learning from their precision, let’s also hoist one for every infrastructure professional who has to deliver with the same level of precision every day so that critical business services that rely on the infrastructure are delivered successfully.