Oct 3, 2017

AOS 2.0: Underlay – Overlay Integration for the Data Center


The introduction of overlays in the early days of SDN enabled organizations to bridge the gap between the dynamic nature of their business policies and the static nature of their network. At the same time, overlays introduced significant challenges, which limited their adoption in the enterprise.

Responding to customer requests in this area, Apstra is announcing today AOS® 2.0. Leveraging the recent advances in network operating system APIs and switch silicon support for VXLAN, AOS 2.0 delivers the first intent-based integrated underlay and overlay solution for the data center network.

Around a decade ago when SDN discussions first began, switches had no APIs, and to deliver dynamic policy there was no choice but to bypass networking engineering teams and extend an overlay on top of the physical network. This approach created a number of problems that limited the adoption of the technology:

  1. Underlays and overlays are opaque to each other. Because the underlay and overlay are completely decoupled, it is that much harder for IT teams to debug networking problems. Was it caused by the overlay? The underlay? Through which links or interfaces do the packets pertaining to this particular overlay tunnel flow?
  1. Organizational processes break with decoupled underlays and overlays.  Overlays made it unclear who was really responsible for network services. The network engineering team? The compute team? The cloud team? Compute teams are often driven to buy and operate an overlay without the participation of network teams. That could mean that two network operators in the same data center don’t really work together. Or even acknowledge each other. Worse, the networking team is often finger-pointed, often without evidence. The network is the most critical asset in the data center; one operational team should be empowered and responsible — not two.
  1. Overlays don’t easily work with bare metal devices. While most workloads are virtualized, there is a lot of bare metal out there; storage, database, and many devices and appliances. The common solution is to build a gateway which generally encaps/decaps flows between the overlay and a bare metal segment. I am bullish on the use of an overlay, but quite honestly, a gateway for bare metal is a hack which only became necessary because of the unnatural fracture between underlay and overlay.

Introducing AOS 2.0:

  1. Integrated overlay/underlay: Leveraging the innovative AOS state repository and intent modeling technology, all the state pertaining to the physical underlay, its topology, its logical entities, virtual networks, and all related telemetry are stored in the AOS distributed data store, and represented in a graph that captures all the pertinent relationships. As a result, AOS 2.0 provides powerful visibility into network state, including the physical and virtual, through its process of closed-loop, continuous validation of state against intent. In short, with AOS 2.0, the underlay/overlay correlation problem that has plagued first generation SDN solutions becomes a thing of the past.
  1. All under the control of the network engineering team! AOS 2.0 enables a network infrastructure that leverages a modern Leaf/Spine L3 underlay network architecture using multi-vendor state-of-the-art equipment, that features an L3 underlay and stitches L2 services as an overlay — within the rack, and across racks. Organizations can then deliver L2 connectivity for their applications, and enforce policies and security zones across their various Application services Tiers — all under the control of network teams.
  1. Natural support for bare metal servers. With AOS 2.0, configuring a virtual network spanning two separate racks is done automatically using an API or our Web interface – “Please create a virtual network connecting these endpoints”. Under the hood, AOS 2.0: (1) configures VLANs connecting end points to Top of Rack (TOR) switches, and (2) configures VXLAN tunnels between TOR switches that belong to different racks, and (3) validates in real time that the virtual network was properly configured (e.g., by continuously ensuring that VTEP routes are seen in routing tables for all devices). This approach applies to both virtual and bare-metal end points and doesn’t requires gateways.

In addition, AOS 2.0 leverages the same AOS core to provide the same unique AOS advantages:

  1. Intent-Based, Vendor-Agnostic, and Closed-Loop: The vendor-independent approach of AOS is taken to another level with our multi-vendor implementation of VXLAN-based virtual networks. Owing to the Intent-Based approach of AOS, arcane vendor-specific configurations of VXLAN are abstracted away from network users. And owing to its closed-loop continuous validation, complicated vendor-specific troubleshooting procedures are eliminated. The result is unprecedented hardware vendor choice and interoperability across both underlay and overlay.
  1. Fully Automated: AOS 2.0 gets us closer to the vision of a self-operating and autonomous network infrastructure. It leverages the extensible foundation of AOS to deliver end-to-end automation of all phases in the life cycle of network services across the underlay and the overlay: design, build, deploy, and validate. This includes Day 0 Design and Initial Provisioning, Day 1 Builds and Day 2 Operational Changes and Troubleshooting capabilities. With unique system-wide commit capabilities for change operations and sophisticated continued validation and troubleshooting capabilities through intent-based analytics, AOS 2.0 delivers the most powerful autonomous operation capabilities available today.

Adding AOS 2.0’s new enterprise-class features (including RBAC, HTTPS, and Headless Operations), organizations can confidently start the process of migrating from legacy L2 data center infrastructures to modern Leaf-Spine infrastructures with fully automated and integrated L3 underlay and L2 overlay — all under the control of networking teams.

AOS 2.0 demonstrates that we are rapidly extending AOS capabilities. Customer-driven feature velocity is a key part of our vision, enabled by the extensible AOS architecture. This brings our customers expanded device support, and advanced intent-based analytics — which are coming as part of turn-key applications in future releases. Contact us to learn what AOS 2.0 can do for your network infrastructure and organization. A new era has begun,  and we’re not looking back!

Mansour Karam

President, Founder