This blog is about how to transform a 3-tier (core-distribution-access) network into a multi-vendor L3 CLOS using the Apstra Operating System (AOS®). This can represent a big step forward for most enterprises with big benefits including complete lifecycle automation.
The majority of data center operators have been running a legacy 3-tier network architecture for years. It works and that’s what people know. They’re also aware that L3 CLOS network architectures offer far greater benefits and they need to get on board. The problem is ‘how’? Most engineers only know 3-tier, and they need to keep their network fully operational during any transition and beyond.
AOS drives the process of 3-tier to L3 CLOS transformation.
AOS guides engineers through the process of building a specification of intent for their new network. This is an important part of running a network. If you don’t know what a network should be doing, you can’t know if it is operating correctly.
AOS assists the engineer in developing a state-of-the-art L3 CLOS design. AOS prevents engineers from making beginner mistakes, and the process ensures that engineers learn L3 CLOS principles and best practices along the way.
AOS transitions from design to implementation by provisioning switches from almost any network vendor. Don’t know different CLIs? No problem. Not a BGP or VRF expert? Gotcha covered. AOS provisions each switch directly through an API to eliminate human error — the major cause of outages. All generated configurations are available for examination, so engineers can easily understand what is being done.
AOS begins the process of continuous network validation. It checks the correctness of the configuration and ensures that the switches are doing what the designer intended. Configuration drift and inconsistent state across switches don’t happen throughout the lifecycle of the network.
AOS begins collecting operational telemetry normalized across multi-vendor switches. Through this telemetry and a powerful query capability, engineers have visibility and insight into what is going on. They also learn L3 CLOS with far less effort than from a traditional bottom-up or box-by-box method.
Once operational, AOS continuously compares the actual network state with its intended state forevermore. Any disconnect between engineering and operations is eliminated.
AOS is self-documenting. Network changes are made by augmenting the declaration of intent and an associated design and blueprint. This has an important ancillary benefit of creating a completely self-documenting environment — a network journal that forms a single source of truth over time. Bye bye excel spreadsheets.
|Infomercial insertion: We call a network deployed with AOS a Self-Operating Network™. We aspire for such networks to configure themselves, run themselves, fix themselves, document themselves and, over time, defend themselves. AOS is a lot like a self-driving car; it’s intent-based (tell it where to go), closed-loop (constantly checking and correcting), and vendor-agnostic (don’t care who built the tires).|
The AOS-lead transition from a 3-tier network into an L3 CLOS is compelling.
Running L2 workloads across a 3-tier network causes complexity.
Many apps and VM mobility demand L2 adjacency throughout the 3-tier fabric. The resulting use of spanning tree and mc-lags may lead to convoluted logical topologies, stranded bandwidth, and create the necessity of deploying expensive modular switches. Hand-crafted VLANs reduce agility and can lead to a very large blast radius in the event of problems. And with L2 3-tier networks, common problems often must be addressed box-by-box by very knowledgeable people.
L2 complexity may be largely eliminated by AOS in a fully automated L3 CLOS. In an L3 CLOS, L2 workloads may operate natively within each rack. L2 workloads may also be enabled across racks by using VXLAN to interconnect leaves over the fabric. In this way all workloads, whether L2 or L3, continue to work within an L3 CLOS as they did in the 3-tier network. However, in the CLOS they benefit from the following advantages.
- Gobs of bandwidth is freed through multi-pathing and parallelism
- Spanning tree-and mc-lags in the fabric are eliminated, simplifying everything
- Problems, mean-time-to-insight, and blast radius are all reduced
- Operations and troubleshooting are much easier
- Devices are much, much cheaper (e.g. fixed-port switches vs modular switches)
The entire network fabric becomes vendor-agnostic and hardware-agnostic. Engineers can feel free to interchange products from Cisco, Arista, Cumulus, Juniper, Snaproute, HP and Dell, at any time. For example, an engineer might select Broadcom based switches for normal compute racks, and choose switches with deeper buffers to interconnect storage devices.
Operations based upon closed-loop telemetry allow monitoring to step up to a new level. AOS detects real-time problems by continuously comparing network state information, across all vendor devices, with the intent of the designer. (Nest thermostats are a good example of a system that runs based on closed-loop telemetry).
The network becomes manageable as a system, not as a pile of independently configured switches, each operated box-by-box.
AOS provides a safe transition
On one night in 1967, Sweden changed from a drive-on-the-left-side country to a drive-on-the-right-side country at midnight. This is a good example of a transition that is best done at once, without a migration plan. After all, a migration might involve some cars driving on the left and others driving on the right; or some cars driving on both the right and the left. Sweden’s transition worked flawlessly.
Networks are not so well suited to flip-the-switch transitions. Transitioning a 3-tier fabric to an L3 CLOS might be better served with a thoughtful migration managed by AOS.
Here’s a suggestion on how engineers, with AOS, might perform a transition:
Step 1: Build a small, powerful, L3 CLOS.
This requires the deployment of two pairs of inexpensive leaf and spine switches. In a 10G/40G environment you can buy these switches off the net for around $5-6K for each list. Note that this simple fabric is likely to be more powerful than most 3-tier fabrics even when using large modular switches in an L2 environment (lots of bandwidth in those big switches, but with spanning-tree it’s largely stranded).
AOS will guide engineers to configure the L3 CLOS switches and deploy them regardless of their vendor-specific equipment choices. The collection of closed-loop telemetry starts right away. Take the time and get used to having automated monitoring and operations 7/24/365.
Step 2: Migrate the core switches and distribution switches from the 3-tier fabric into the CLOS.
Visually, these switches may be seen as hanging under the leaves of the L3 CLOS.
This migration should be transparent to both distribution switches and core switches (called border routers within an L3 CLOS). There may be some manageable detail required in this step. Call us and we’ll describe how it works.
AOS will help design, provision, configure, validate and operate it all while catching mistakes. CLI problems — gone. BGP peering mistakes — over. AOS will even tell you if you plug cables or transceivers into the wrong ports.
Step 3: Migrate compute nodes from access switches to leaf switches.
Tell AOS you want to add a rack (or several racks) to the L3 CLOS. AOS will tell you how to do it and if/when you need to add a new spine. Then you are ready to migrate the compute nodes.
We are excited to be using AOS to help enterprises transition their legacy 3-tier network into an L3 CLOS. The value in terms of cost, simplicity, agility, capacity, and operational effectiveness are substantial. When coupled with intent-based operations, closed-loop telemetry, and vendor-agnosticism the value becomes profound.
We’d be very interested in your views regarding AOS’ automated transformation and operations capabilities. Please let us know what you think in the comments section below.