Chapter 9: SD-WAN and Cloud Networking Automation
9.1 Introduction
The modern enterprise network extends far beyond the traditional data center, encompassing geographically dispersed branches, remote workers, and a rapidly expanding footprint in public and private clouds. Software-Defined Wide Area Networking (SD-WAN) and Cloud Networking have emerged as foundational technologies to manage this complexity, offering agility, performance, and cost optimization. However, realizing their full potential necessitates robust automation.
This chapter delves into NetDevOps strategies for automating the deployment, configuration, and day-2 operations of SD-WAN and cloud network environments. We will explore how to leverage Ansible, Python, and Infrastructure as Code (IaC) principles to streamline workflows across various vendors, with a particular focus on Cisco SD-WAN and major cloud providers like AWS and Azure.
What this chapter covers:
- Understanding the architecture and control planes of SD-WAN and cloud networks.
- Leveraging APIs (RESTCONF, gRPC, cloud-specific APIs) for automation.
- Automating Cisco SD-WAN deployments using Ansible and Python.
- Provisioning and managing cloud network infrastructure with Terraform and Python.
- Implementing IaC for both SD-WAN and cloud components.
- Addressing security, performance, and troubleshooting in automated SD-WAN and cloud environments.
Why it’s important:
Manual configuration of SD-WAN and cloud networks is prone to errors, slow, and cannot scale with the demands of modern business. Automation drastically reduces operational overhead, ensures configuration consistency, accelerates deployment cycles, and improves network reliability and security. Mastering these skills is essential for any network engineer operating in today’s hybrid cloud landscape.
What you’ll be able to do after reading this chapter:
- Design and implement automation solutions for SD-WAN device onboarding and policy deployment.
- Automate the provisioning of cloud Virtual Private Clouds (VPCs) or Virtual Networks (VNETs) and associated services.
- Integrate SD-WAN with cloud networking environments through automated VPN tunnels and routing.
- Apply NetDevOps principles to manage network infrastructure as code across hybrid environments.
- Identify and mitigate common security and performance challenges in automated SD-WAN and cloud deployments.
9.2 Technical Concepts: Architectures and Automation Interfaces
9.2.1 SD-WAN Architecture and Components
SD-WAN decouples the control plane from the data plane, centralizing management and policy enforcement. Key components include:
- Orchestration Plane: Typically a central management console (e.g., Cisco vManage, FortiManager) for design, provisioning, monitoring, and troubleshooting. It provides a graphical interface and, critically for automation, a set of APIs.
- Control Plane: Responsible for establishing and maintaining the SD-WAN overlay topology, distributing routing information, and enforcing security policies (e.g., Cisco vSmart Controller).
- Data Plane: Network devices (e.g., Cisco vEdge/cEdge routers, FortiGate appliances) that forward traffic based on policies received from the control plane and establish secure tunnels over the underlay network.
- Analytics Plane: Collects telemetry data for performance monitoring, visibility, and optimization.
Underlay vs. Overlay: The underlay is the physical or logical network infrastructure (e.g., MPLS, Internet broadband, 5G) that provides connectivity. The overlay is the virtual network built on top of the underlay, typically using encrypted tunnels (e.g., IPsec, GRE) to connect sites and provide application-aware routing.
Diagram: Cisco SD-WAN Architecture
@startuml
skinparam handwritten true
skinparam shadowing false
skinparam style strictulm
cloud "Internet/MPLS Underlay" as Underlay {
}
package "SD-WAN Control Plane" {
node "vManage NMS" as vManage {
interface "REST API" as VMANAGE_API
}
node "vSmart Controller" as vSmart
node "vBond Orchestrator" as vBond
}
package "Branch/Data Center" {
node "vEdge/cEdge Router 1" as vEdge1
}
package "Cloud On-Ramp" {
node "vEdge/cEdge Router N" as vEdgeN
}
vManage -[hidden]up- vBond
vManage -[hidden]up- vSmart
VMANAGE_API -down- vManage
vManage [label="vSmart : Policy push (OMP)
vSmart <"] vBond : Discovery
vEdge1 <[label="> vBond : Initial Authentication
vEdgeN <"] vBond : Initial Authentication
vEdge1 <[label="> vSmart : Control Plane (OMP)
vEdgeN <"] vSmart : Control Plane (OMP)
vEdge1 <[label="> vEdgeN : Data Plane (IPsec/DTLS)
vEdge1 <"] Underlay : Transport VPN
vEdgeN <[label="> Underlay : Transport VPN
Underlay <"] vBond : Discovery Reachability
Underlay <[label="> vSmart : Control Channel
Underlay <"] vManage : Management Channel
@enduml
9.2.2 Cloud Networking Concepts
Cloud networking provides virtualized network services within a public cloud provider’s infrastructure. Key concepts include:
- Virtual Private Cloud (VPC) / Virtual Network (VNET): An isolated, private section of a cloud provider’s network, where users can launch resources.
- Subnets: Divisions within a VPC/VNET, typically associated with Availability Zones for high availability.
- Route Tables: Control traffic routing within and out of subnets.
- Security Groups / Network Security Groups (NSG): Stateful firewalls that control inbound and outbound traffic for instances or subnets.
- Transit Gateway / Hub-and-Spoke VNET Peering: Centralized connectivity hubs to interconnect multiple VPCs/VNETs, simplifying routing and reducing the number of point-to-point connections.
- Direct Connect / ExpressRoute: Dedicated, private network connections from on-premises to cloud.
- VPN Gateway: Enables encrypted connections over the public internet between on-premises networks and cloud VPCs/VNETs.
Diagram: Hybrid Cloud Networking with Transit Gateway
nwdiag {
internet [shape = cloud];
network "On-Premises Network" {
address = "10.1.0.0/16"
router1 [address = "10.1.0.1"];
server1 [address = "10.1.1.10"];
}
network "AWS Region Network" {
cloud_router [label = "AWS Transit Gateway"];
network "VPC A" {
address = "10.10.0.0/16"
web_tier_a [address = "10.10.1.10"];
app_tier_a [address = "10.10.2.10"];
}
network "VPC B" {
address = "10.20.0.0/16"
web_tier_b [address = "10.20.1.10"];
db_tier_b [address = "10.20.3.10", shape = database];
}
}
internet -- router1;
router1 -- "On-Premises Network";
router1 -- cloud_router [label = "IPsec VPN"];
cloud_router -- "VPC A";
cloud_router -- "VPC B";
web_tier_a -- app_tier_a;
web_tier_b -- db_tier_b;
}
9.2.3 Control Plane vs. Data Plane Automation
- Control Plane Automation: Focuses on configuring the orchestrators and controllers. This often involves interacting with REST APIs (e.g., vManage REST API, AWS EC2 API, Azure Resource Manager API) to define network policies, provision virtual devices, manage users, and orchestrate services. Data models like YANG are crucial here for structured configuration.
- RFC References: RFC 7950 (YANG 1.1), RFC 8072 (YANG-Push), RFC 6241 (NETCONF), RFC 8040 (RESTCONF).
- Data Plane Automation: In traditional networks, this means configuring individual devices (routers, switches) via CLI, NETCONF, or RESTCONF. In SD-WAN, the data plane devices (vEdges/cEdges) often receive their configuration and policies from the control plane. Automation here might involve pushing specific localized configs, troubleshooting, or collecting telemetry. In cloud, it’s about provisioning virtual appliances (e.g., firewalls) and configuring their interfaces and routing.
9.2.4 API-Driven Management: NETCONF, RESTCONF, gRPC, and Cloud APIs
Modern network devices and controllers increasingly expose programmatic interfaces.
- NETCONF (Network Configuration Protocol): An XML-based protocol designed for managing network devices. It provides mechanisms to install, manipulate, and delete configuration data. It uses YANG models to define the structure of configuration and state data.
- RFC: RFC 6241 (NETCONF Protocol), RFC 6242 (Using NETCONF over SSH).
- RESTCONF: A REST-like API that operates over HTTP(S), providing a simpler, stateless interface to access YANG-modeled data. It’s often preferred for web-based applications and general automation due to its ubiquity and simplicity.
- RFC: RFC 8040 (RESTCONF Protocol).
- gRPC (gRPC Remote Procedure Call): A high-performance, open-source RPC framework that can use Protocol Buffers for structured data serialization. It’s gaining traction for high-volume telemetry and low-latency control plane interactions, often with YANG-based data models.
- Key Benefit: Bidirectional streaming, efficiency.
- Cloud-Specific APIs: Each major cloud provider (AWS, Azure, GCP) offers its own comprehensive set of RESTful APIs to manage all aspects of their infrastructure. These are typically accessed via SDKs (e.g.,
boto3for AWS, Azure SDK for Python) or CLI tools.
Diagram: Automation Interactions with SD-WAN and Cloud APIs
digraph automation_apis {
rankdir=LR;
node [shape=box];
subgraph cluster_automation {
label = "Automation Platform";
bgcolor = lightblue;
Python_Scripts [label="Python Scripts"];
Ansible_Playbooks [label="Ansible Playbooks"];
Terraform_Config [label="Terraform Config"];
}
subgraph cluster_sdwan {
label = "SD-WAN Controller (e.g., Cisco vManage)";
bgcolor = lightgreen;
vManage_API [label="vManage RESTCONF API"];
vSmart_NETCONF [label="vSmart NETCONF/gRPC"];
cEdge_NETCONF [label="cEdge NETCONF/RESTCONF"];
}
subgraph cluster_cloud {
label = "Cloud Provider (e.g., AWS)";
bgcolor = lightcoral;
AWS_API [label="AWS EC2/VPC API"];
AWS_SDK [label="AWS SDK (e.g., boto3)"];
}
Python_Scripts -> vManage_API [label="HTTPS/REST"];
Ansible_Playbooks -> vManage_API [label="HTTPS/REST"];
Ansible_Playbooks -> cEdge_NETCONF [label="SSH/NETCONF"];
Python_Scripts -> AWS_SDK [label="SDK Calls"];
Terraform_Config -> AWS_API [label="Provider API"];
vManage_API -> vSmart_NETCONF [label="Orchestration"];
vSmart_NETCONF -> cEdge_NETCONF [label="Policy Push"];
AWS_SDK -> AWS_API [label="Internal API Call"];
}
9.2.5 IPsec Tunnel Header Structure (Simplified)
SD-WAN heavily relies on IPsec for secure data plane connectivity. Understanding its basic structure is helpful for troubleshooting.
packetdiag {
colwidth = 32
0-31: SPI (Security Parameter Index)
32-63: Sequence Number
64-95: IPsec Payload (e.g., ESP Header, Inner IP Packet)
...
variable: Authentication Data (if used)
}
9.3 Configuration Examples (Multi-vendor)
9.3.1 Cisco SD-WAN (cEdge) - Manual Configuration for Verification
While the goal is automation, understanding the underlying device configuration provides context. In Cisco SD-WAN, vManage uses templates to push configurations. Here, we show a simplified cEdge configuration for an IPsec tunnel that would be managed by vManage, along with verification commands. Note: Direct CLI configuration on a cEdge usually means it’s running in autonomous mode or for specific local-only settings not controlled by vManage templates. The following is illustrative of the outcome of a vManage template.
! cEdge Router Configuration (Illustrative - primarily managed by vManage templates)
! Define a VPN interface for transport
interface GigabitEthernet0/0/0.100
encapsulation dot1Q 100
ip dhcp client
negotiation auto
no shut
vpn 0 ! Transport VPN
! Interface in Service VPN for LAN segment
interface GigabitEthernet0/0/1
ip address 10.1.1.1 255.255.255.0
no shut
vpn 10 ! Service VPN
! Overlay Management Protocol (OMP)
omp
no shutdown
graceful-restart
advertise networks
advertise connected
advertise static
advertise bgp
no shutdown
! System Global Configuration
system
system-ip 192.0.2.10
site-id 100
hostname cEdge-Branch1
vpn 0
interface GigabitEthernet0/0/0.100
tunnel-interface
encapsulation ipsec
color biz-internet
! ipsec properties would be configured by vManage based on templates
Verification Commands on cEdge:
show sdwan control connections
show sdwan omp peers
show sdwan omp routes
show sdwan ipsec tunnels
show ip route vpn 10
Expected Output (Snippet):
cEdge-Branch1# show sdwan control connections
PEER PEER
PRIVATE PRIVATE
VPN TYPE PEER IP PEER ID SITE ID DOMAIN ID STATE UPTIME PORT PUBLIC IP PUBLIC PORT LOCAL COLOR REMOTE COLOR VSMART-REDUNDANCY-GROUP
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 vsmart 192.0.2.1 10.0.0.1 1 1 up 0:00:15:30 12346 192.0.2.1 12346 biz-internet default -
cEdge-Branch1# show sdwan omp peers
PEER PEER PEER REFRESH TIME
PEER TYPE PEER ID SITE ID DOMAIN ID OVERLOAD STATE UPTIME CONTROL (SECONDS)
----------------------------------------------------------------------------------------------------------
vsmart 10.0.0.1 1 1 No up 0:00:15:30 0 530
9.3.2 AWS Cloud Networking - VPC with VPN Gateway
This example uses Terraform to provision an AWS VPC, subnets, internet gateway, route tables, and a Customer Gateway for an IPsec VPN connection.
# main.tf for AWS VPC and VPN Gateway
# Requires AWS provider configured elsewhere (e.g., providers.tf or environment vars)
variable "aws_region" {
description = "AWS region for deployment"
type = string
default = "us-east-1"
}
variable "vpc_cidr" {
description = "CIDR block for the VPC"
type = string
default = "10.100.0.0/16"
}
variable "public_subnet_cidr" {
description = "CIDR block for the public subnet"
type = string
default = "10.100.1.0/24"
}
variable "private_subnet_cidr" {
description = "CIDR block for the private subnet"
type = string
default = "10.100.2.0/24"
}
variable "onprem_public_ip" {
description = "Public IP of the on-premises VPN device"
type = string
# SECURITY WARNING: Replace with actual IP, do not expose sensitive IPs in production IaC
default = "203.0.113.5" # Example IP
}
# 1. Create VPC
resource "aws_vpc" "sdwan_cloud_vpc" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "SDWAN-Cloud-VPC"
}
}
# 2. Create Internet Gateway
resource "aws_internet_gateway" "sdwan_igw" {
vpc_id = aws_vpc.sdwan_cloud_vpc.id
tags = {
Name = "SDWAN-Cloud-IGW"
}
}
# 3. Create Public Subnet
resource "aws_subnet" "sdwan_public_subnet" {
vpc_id = aws_vpc.sdwan_cloud_vpc.id
cidr_block = var.public_subnet_cidr
availability_zone = "${var.aws_region}a" # Use first AZ
map_public_ip_on_launch = true # Instances in this subnet get public IPs
tags = {
Name = "SDWAN-Cloud-Public-Subnet"
}
}
# 4. Create Private Subnet
resource "aws_subnet" "sdwan_private_subnet" {
vpc_id = aws_vpc.sdwan_cloud_vpc.id
cidr_block = var.private_subnet_cidr
availability_zone = "${var.aws_region}a" # Use first AZ
tags = {
Name = "SDWAN-Cloud-Private-Subnet"
}
}
# 5. Create Public Route Table
resource "aws_route_table" "sdwan_public_rt" {
vpc_id = aws_vpc.sdwan_cloud_vpc.id
tags = {
Name = "SDWAN-Cloud-Public-RT"
}
}
# 6. Route for Internet Gateway
resource "aws_route" "sdwan_public_internet_route" {
route_table_id = aws_route_table.sdwan_public_rt.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.sdwan_igw.id
}
# 7. Associate Public Subnet with Public Route Table
resource "aws_route_table_association" "sdwan_public_rt_assoc" {
subnet_id = aws_subnet.sdwan_public_subnet.id
route_table_id = aws_route_table.sdwan_public_rt.id
}
# 8. Create Private Route Table (for VPN connectivity)
resource "aws_route_table" "sdwan_private_rt" {
vpc_id = aws_vpc.sdwan_cloud_vpc.id
tags = {
Name = "SDWAN-Cloud-Private-RT"
}
}
# 9. Associate Private Subnet with Private Route Table
resource "aws_route_table_association" "sdwan_private_rt_assoc" {
subnet_id = aws_subnet.sdwan_private_subnet.id
route_table_id = aws_route_table.sdwan_private_rt.id
}
# 10. Create Customer Gateway (On-Premises VPN device representation)
resource "aws_customer_gateway" "sdwan_cgw" {
bgp_asn = "65000" # Example ASN, match on-premises
ip_address = var.onprem_public_ip
type = "ipsec.1"
tags = {
Name = "SDWAN-Branch-CGW"
}
}
# 11. Create Virtual Private Gateway (Cloud VPN endpoint)
resource "aws_vpn_gateway" "sdwan_vgw" {
vpc_id = aws_vpc.sdwan_cloud_vpc.id
tags = {
Name = "SDWAN-Cloud-VGW"
}
}
# 12. Create VPN Connection (IPsec tunnel)
resource "aws_vpn_connection" "sdwan_vpn" {
vpn_gateway_id = aws_vpn_gateway.sdwan_vgw.id
customer_gateway_id = aws_customer_gateway.sdwan_cgw.id
type = "ipsec.1"
static_routes_only = false # Use dynamic routing (BGP)
tags = {
Name = "SDWAN-Branch-to-Cloud-VPN"
}
}
# 13. Propagate routes from VPN Gateway to Private Route Table
resource "aws_vpn_gateway_route_propagation" "sdwan_vgw_route_prop" {
vpn_gateway_id = aws_vpn_gateway.sdwan_vgw.id
route_table_id = aws_route_table.sdwan_private_rt.id
}
output "vpc_id" {
description = "The ID of the created VPC"
value = aws_vpc.sdwan_cloud_vpc.id
}
output "public_subnet_id" {
description = "The ID of the created public subnet"
value = aws_subnet.sdwan_public_subnet.id
}
output "private_subnet_id" {
description = "The ID of the created private subnet"
value = aws_subnet.sdwan_private_subnet.id
}
output "vpn_connection_id" {
description = "The ID of the created VPN connection"
value = aws_vpn_connection.sdwan_vpn.id
}
Verification Commands (AWS CLI):
aws ec2 describe-vpcs --filters "Name=tag:Name,Values=SDWAN-Cloud-VPC"
aws ec2 describe-subnets --filters "Name=tag:Name,Values=SDWAN-Cloud-Public-Subnet"
aws ec2 describe-vpn-connections --filters "Name=tag:Name,Values=SDWAN-Branch-to-Cloud-VPN"
aws ec2 describe-route-tables --filters "Name=tag:Name,Values=SDWAN-Cloud-Private-RT" --query 'RouteTables[*].Routes'
Expected Output (Snippet):
{
"Vpcs": [
{
"CidrBlock": "10.100.0.0/16",
"DhcpOptionsId": "dopt-xxxxxxxxxxxxxxxxx",
"State": "available",
"VpcId": "vpc-xxxxxxxxxxxxxxxxx",
"OwnerId": "xxxxxxxxxxxx",
"InstanceTenancy": "default",
"Ipv6CidrBlockAssociationSet": [],
"IsDefault": false,
"Tags": [
{
"Key": "Name",
"Value": "SDWAN-Cloud-VPC"
}
],
"OwnerCidrBlockAssociationSet": [
{
"CidrBlock": "10.100.0.0/16",
"AssociationId": "vpc-cidr-assoc-xxxxxxxxxxxxxxxxx",
"CidrBlockState": {
"State": "associated"
}
}
]
}
]
}
9.4 Automation Examples
9.4.1 Automating Cisco SD-WAN with Ansible
This Ansible playbook demonstrates how to use the cisco.sdwan.viptela collection to interact with a vManage controller. The playbook applies a device template to a specific vEdge/cEdge device.
Prerequisites:
- Ansible
cisco.sdwancollection installed. - vManage API credentials (username/password or token) configured securely (e.g., as Ansible vault variables or environment variables).
- Inventory file with vManage details.
# sdwan_onboard_device.yaml
---
- name: Automate Cisco SD-WAN Device Onboarding and Template Attachment
hosts: vmanage_controllers
connection: local
gather_facts: no
vars:
vmanage_host: ""
vmanage_port: 8443
vmanage_username: "" # Or from Ansible Vault
vmanage_password: "" # Or from Ansible Vault
# Device details for the new branch cEdge
device_name: "cEdge-Branch-X"
device_ip: "192.0.2.100" # System IP of the cEdge
device_template: "Branch_cEdge_Template_v1" # Name of the device template on vManage
tasks:
- name: Ensure device is onboarded (if not already)
# In a real scenario, this might involve ztp_claim or similar.
# For this example, we assume the device is already provisioned/claimed
# and we are attaching a template.
# This task is a placeholder to represent a prior onboarding step.
# Or you might fetch device UUID if it's already there
debug:
msg: "Assuming with System IP is already claimed in vManage."
- name: Fetch device UUID for the target cEdge
cisco.sdwan.viptela_device_info:
vmanage_host: ""
vmanage_port: ""
vmanage_username: ""
vmanage_password: ""
device_system_ip: ""
register: device_info_result
- name: Set device UUID fact
set_fact:
device_uuid: ""
when: device_info_result.data | length > 0
- name: Get template ID by name
cisco.sdwan.viptela_template_info:
vmanage_host: ""
vmanage_port: ""
vmanage_username: ""
vmanage_password: ""
template_name: ""
template_type: "device" # Important to specify device template type
register: template_info_result
- name: Set template ID fact
set_fact:
template_id: ""
when: template_info_result.data | length > 0
- name: Attach device template to the cEdge
cisco.sdwan.viptela_template_attach:
vmanage_host: ""
vmanage_port: ""
vmanage_username: ""
vmanage_password: ""
device_template:
name: ""
id: ""
devices:
- uuid: ""
system_ip: ""
apply_activate: yes # Push and activate the configuration
when: device_uuid is defined and template_id is defined
9.4.2 Automating Cloud Network Provisioning with Python (AWS boto3)
This Python script uses boto3 to create a new VPC, a public subnet, and an Internet Gateway in AWS. This is a common starting point for a cloud network on-ramp.
# aws_provision_vpc.py
import boto3
import logging
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
def create_aws_vpc_with_igw_and_subnet(region, vpc_cidr, public_subnet_cidr, tags):
"""
Creates an AWS VPC, an Internet Gateway, and a public subnet.
Args:
region (str): AWS region (e.g., 'us-east-1').
vpc_cidr (str): CIDR block for the VPC (e.g., '10.200.0.0/16').
public_subnet_cidr (str): CIDR block for the public subnet (e.g., '10.200.1.0/24').
tags (dict): Dictionary of tags to apply to created resources.
Returns:
dict: A dictionary containing IDs of created resources, or None on failure.
"""
ec2 = boto3.client('ec2', region_name=region)
resource_ids = {}
try:
logging.info(f"Creating VPC with CIDR: {vpc_cidr} in region: {region}")
vpc_response = ec2.create_vpc(CidrBlock=vpc_cidr, TagSpecifications=[
{'ResourceType': 'vpc', 'Tags': [{'Key': k, 'Value': v} for k, v in tags.items()]}
])
vpc_id = vpc_response['Vpc']['VpcId']
resource_ids['vpc_id'] = vpc_id
logging.info(f"VPC {vpc_id} created.")
# Enable DNS hostnames and support for the VPC
ec2.modify_vpc_attribute(VpcId=vpc_id, EnableDnsSupport={'Value': True})
ec2.modify_vpc_attribute(VpcId=vpc_id, EnableDnsHostnames={'Value': True})
logging.info(f"Enabled DNS support and hostnames for VPC {vpc_id}.")
logging.info("Creating Internet Gateway.")
igw_response = ec2.create_internet_gateway(TagSpecifications=[
{'ResourceType': 'internet-gateway', 'Tags': [{'Key': k, 'Value': v} for k, v in tags.items()]}
])
igw_id = igw_response['InternetGateway']['InternetGatewayId']
resource_ids['igw_id'] = igw_id
logging.info(f"Internet Gateway {igw_id} created.")
logging.info(f"Attaching Internet Gateway {igw_id} to VPC {vpc_id}.")
ec2.attach_internet_gateway(InternetGatewayId=igw_id, VpcId=vpc_id)
logging.info(f"Internet Gateway {igw_id} attached to VPC {vpc_id}.")
logging.info(f"Creating public subnet with CIDR: {public_subnet_cidr}.")
subnet_response = ec2.create_subnet(
VpcId=vpc_id,
CidrBlock=public_subnet_cidr,
AvailabilityZone=f"{region}a", # Using the first AZ for simplicity
TagSpecifications=[
{'ResourceType': 'subnet', 'Tags': [{'Key': k, 'Value': v} for k, v in tags.items()]}
]
)
public_subnet_id = subnet_response['Subnet']['SubnetId']
resource_ids['public_subnet_id'] = public_subnet_id
logging.info(f"Public subnet {public_subnet_id} created.")
logging.info(f"Creating route table for public subnet {public_subnet_id}.")
route_table_response = ec2.create_route_table(
VpcId=vpc_id,
TagSpecifications=[
{'ResourceType': 'route-table', 'Tags': [{'Key': k, 'Value': v} for k, v in tags.items()]}
]
)
public_route_table_id = route_table_response['RouteTable']['RouteTableId']
resource_ids['public_route_table_id'] = public_route_table_id
logging.info(f"Public route table {public_route_table_id} created.")
logging.info(f"Adding default route to Internet Gateway {igw_id} in route table {public_route_table_id}.")
ec2.create_route(
RouteTableId=public_route_table_id,
DestinationCidrBlock='0.0.0.0/0',
GatewayId=igw_id
)
logging.info("Default route added.")
logging.info(f"Associating public subnet {public_subnet_id} with route table {public_route_table_id}.")
ec2.associate_route_table(
SubnetId=public_subnet_id,
RouteTableId=public_route_table_id
)
logging.info("Subnet-route table association complete.")
ec2.modify_subnet_attribute(SubnetId=public_subnet_id, MapPublicIpOnLaunch={'Value': True})
logging.info(f"Enabled auto-assign public IP on launch for subnet {public_subnet_id}.")
return resource_ids
except Exception as e:
logging.error(f"Error during AWS resource creation: {e}")
return None
if __name__ == "__main__":
aws_region = "us-east-1" # Or get from environment/config
my_vpc_cidr = "10.200.0.0/16"
my_public_subnet_cidr = "10.200.1.0/24"
resource_tags = {"Project": "NetDevOps-SDWAN", "Environment": "Dev", "ManagedBy": "Python"}
created_resources = create_aws_vpc_with_igw_and_subnet(
aws_region, my_vpc_cidr, my_public_subnet_cidr, resource_tags
)
if created_resources:
logging.info("Successfully provisioned AWS resources:")
for k, v in created_resources.items():
logging.info(f" {k}: {v}")
else:
logging.error("Failed to provision AWS resources.")
9.5 Security Considerations
Automating SD-WAN and cloud networking introduces new attack vectors and necessitates careful security planning.
- API Security:
- Authentication & Authorization: Use strong authentication methods (e.g., OAuth2, API tokens) for API access. Implement Role-Based Access Control (RBAC) to grant minimum necessary privileges to automation accounts.
- Secure Communication: Always use HTTPS/TLS for all API interactions.
- Credential Management: Store API keys, tokens, and passwords securely using tools like Ansible Vault, HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault. Never hardcode credentials.
- SD-WAN Specific Security:
- Controller Hardening: Secure access to vManage, vSmart, and vBond controllers. Apply security patches promptly.
- IPsec Encryption: Ensure strong IPsec policies with robust encryption and authentication algorithms (e.g., AES-256, SHA-256) for overlay tunnels (RFC 4301, RFC 4303).
- Segmentation: Leverage SD-WAN’s ability to create VPNs/VRFs to segment traffic, ensuring sensitive data travels only over authorized paths.
- Zero Trust Network Access (ZTNA): SD-WAN can integrate with ZTNA solutions to apply granular access policies based on user identity and device posture.
- Cloud Networking Security:
- Infrastructure as Code Security Scanning: Use tools like Checkov, Kics, or Terrascan to scan Terraform/CloudFormation templates for security misconfigurations before deployment.
- Security Groups/NSGs: Implement least-privilege principles. Only open ports/protocols necessary for specific applications. Automate their management via IaC.
- Network Segmentation: Use VPCs/VNETs and subnets to logically segment different applications or environments. Use Transit Gateways with routing policies to control inter-VPC traffic.
- VPN/Direct Connect Security: Ensure strong encryption and authentication for hybrid connectivity. Regularly audit VPN configurations.
- Monitoring and Logging: Implement centralized logging and monitoring (e.g., AWS CloudTrail, VPC Flow Logs, Azure Monitor) to detect anomalous activity. Automate alerts for security events.
- Supply Chain Security: Secure your automation pipeline. Ensure that Ansible playbooks, Python scripts, and Terraform configurations are stored in secure version control, scanned for vulnerabilities, and deployed through trusted CI/CD processes.
Security Configuration Example (AWS Security Group via Terraform):
# Security Group for a Web Server in the Cloud VPC
resource "aws_security_group" "web_sg" {
vpc_id = aws_vpc.sdwan_cloud_vpc.id
name = "web-server-sg"
description = "Allow inbound HTTP/S and SSH from specific ranges"
# Inbound rules
ingress {
description = "Allow HTTP from anywhere"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # SECURITY WARNING: For production, narrow this down!
}
ingress {
description = "Allow HTTPS from anywhere"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # SECURITY WARNING: For production, narrow this down!
}
ingress {
description = "Allow SSH from On-Premises Network"
from_port = 22
to_port = 22
protocol = "tcp"
# SECURITY BEST PRACTICE: Replace with your actual on-premises network CIDR.
# Never expose SSH to 0.0.0.0/0 in production.
cidr_blocks = ["10.1.0.0/16"]
}
# Outbound rules
egress {
description = "Allow all outbound traffic"
from_port = 0
to_port = 0
protocol = "-1" # -1 means all protocols
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "Web-Server-SG"
}
}
9.6 Verification & Troubleshooting
Automated deployments require automated verification and systematic troubleshooting.
9.6.1 SD-WAN Verification & Troubleshooting
Verification Commands (Cisco cEdge): Once templates are applied via vManage, verify connectivity and policy.
show sdwan control connections ! Verify control plane connections to vSmart/vBond
show sdwan omp peers ! Verify OMP adjacency with vSmart
show sdwan omp routes ! Verify routes learned via OMP
show sdwan ipsec tunnels ! Verify IPsec data plane tunnels are up
show policy from-vsmart ! Review applied policies
show interface tunnel ! Check tunnel interface status
ping vpn 10 10.10.1.10 source-interface GigabitEthernet0/0/1 ! Ping a cloud resource from service VPN
Common SD-WAN Issues:
| Issue | Potential Root Cause | Resolution Steps | Debug Commands |
|---|---|---|---|
| Device onboarding failure | Incorrect vBond IP, firewall blocking ports, invalid serial/token | Verify vBond reachability from cEdge. Check firewall rules (ports 12346 for DTLS/TLS, 9300 for BFD). Ensure correct serial number and token are used. | show control connections, show control connection-history, debug sdwan control-plane |
| Template attachment failure | Template errors, device not reachable by vManage, device not in proper state | Review vManage task log for detailed errors. Ensure device is online and manageable by vManage. Validate template syntax. | vManage GUI: “Monitor -> Tasks”, “Configuration -> Devices -> Validation” |
| IPsec tunnel not coming up | Firewall issues, NAT traversal problems, incorrect IPsec/DTLS parameters, routing issues | Check reachability to peer (vEdge/cEdge or Cloud VPN Gateway). Verify NAT configuration. Ensure consistent IPsec parameters. Check routing to remote endpoint. | show sdwan ipsec tunnels, show crypto isakmp sa, show crypto ipsec sa, debug crypto isakmp |
| OMP routes not advertised/learned | OMP disabled, filter policies, VPN membership issues | Ensure omp no shutdown is configured. Check OMP advertisements (omp advertise networks). Review OMP export/import policies. Verify interface vpn membership. | show sdwan omp peers, show sdwan omp routes, show sdwan policy |
| Application performance issues | Underlay congestion, incorrect application-aware routing policy, QoS misconfiguration | Monitor underlay network health. Verify application-aware routing policies are directing traffic correctly. Check QoS settings on devices and vManage. Look for packet loss/latency. | show sdwan app-route statistics, show policy service-class, monitor traffic |
9.6.2 Cloud Networking Verification & Troubleshooting
Verification Commands (AWS CLI):
aws ec2 describe-vpcs --vpc-ids vpc-xxxxxxxxxxxxxxxxx # Verify VPC details
aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-xxxxxxxxxxxxxxxxx" # Verify subnets
aws ec2 describe-route-tables --filters "Name=vpc-id,Values=vpc-xxxxxxxxxxxxxxxxx" # Verify route tables and associations
aws ec2 describe-network-acls --filters "Name=vpc-id,Values=vpc-xxxxxxxxxxxxxxxxx" # Review NACLs (if used)
aws ec2 describe-security-groups --filters "Name=vpc-id,Values=vpc-xxxxxxxxxxxxxxxxx" # Verify Security Groups
aws ec2 describe-vpn-connections --vpn-connection-ids vpn-xxxxxxxxxxxxxxxxx # Verify VPN connection status
Common Cloud Networking Issues:
| Issue | Potential Root Cause | Resolution Steps | Debug Methods |
|---|---|---|---|
| VPC/Subnet creation failure | Invalid CIDR, region limits, IAM permissions | Review Terraform apply logs or Python script output. Check AWS Service Quotas. Verify IAM policy allows ec2:CreateVpc, ec2:CreateSubnet, etc. | CloudTrail logs, Terraform debug output, Python logging |
| Internet access from public subnet fails | Missing/incorrect Internet Gateway, route table misconfiguration, Security Group/NACL blocking | Verify IGW is attached to VPC. Ensure public route table has a default route (0.0.0.0/0) pointing to IGW. Check Security Group egress rules and NACL rules. | aws ec2 describe-route-tables, aws ec2 describe-security-groups, tcpdump on instance |
| VPN tunnel to on-premise down | Mismatched IPsec parameters, incorrect Customer Gateway IP, routing issues, firewall on-premise | Verify IPsec configuration (encryption, authentication, PFS, DPD) on both sides. Ensure Customer Gateway IP is correct public IP of on-premises device. Check on-premises firewall rules. | aws ec2 describe-vpn-connections, VPC Flow Logs, on-premises device logs (show crypto isakmp sa) |
| Inter-VPC traffic blocked | Missing Transit Gateway attachment/route, Security Group/NACL blocking | Ensure VPCs are attached to Transit Gateway. Verify TGW route tables have correct routes. Check Security Groups and NACLs. | aws ec2 describe-transit-gateway-attachments, aws ec2 search-transit-gateway-routes, VPC Flow Logs |
| Instance cannot reach resource | Security Group/NACL, instance route table, DNS | Verify Security Group/NACL rules. Check instance’s effective route table. Confirm DNS resolution (e.g., if connecting to a private endpoint). | ping, traceroute from instance, VPC Flow Logs, aws ec2 get-console-output |
9.7 Performance Optimization
Optimizing performance in SD-WAN and cloud networks involves fine-tuning policies, ensuring adequate capacity, and continuous monitoring.
- SD-WAN Performance Optimization:
- Application-Aware Routing: Configure policies to dynamically steer critical application traffic over the best performing paths (e.g., low latency, low jitter, high bandwidth links). Use path monitoring to detect degraded links and automatically failover.
- QoS (Quality of Service): Implement QoS policies on vEdges/cEdges to prioritize business-critical traffic over less important traffic. Map applications to specific QoS classes.
- Traffic Shaping/Policing: Control bandwidth usage to prevent any single application or user from monopolizing resources.
- Link Aggregation/Load Balancing: Utilize multiple underlay links to increase aggregate bandwidth and provide redundancy.
- Controller Scaling: Ensure vManage, vSmart, and vBond controllers are adequately resourced (CPU, memory, storage) and scaled horizontally to handle the number of managed devices and traffic volume.
- Cloud Networking Performance Optimization:
- Direct Connect/ExpressRoute: For predictable, high-bandwidth, low-latency connectivity to on-premises, dedicated private connections are superior to IPsec VPN over the internet.
- Transit Gateway Placement: Optimize TGW placement to minimize inter-region or inter-AZ latency.
- VPC Peering vs. Transit Gateway: Understand the trade-offs. Peering is point-to-point; TGW provides hub-and-spoke and simplifies routing for many VPCs.
- Network Acceleration Services: Cloud providers offer services (e.g., AWS Global Accelerator, Azure Front Door) to improve application performance over the internet by routing traffic through their optimized global networks.
- Instance Network Performance: Choose appropriate instance types with sufficient network bandwidth and optimized networking drivers.
- Monitoring and Alerts: Continuously monitor key performance metrics (latency, packet loss, bandwidth utilization) for both the SD-WAN overlay and cloud network segments. Automate alerts for performance degradation.
9.8 Hands-On Lab: Automated Branch SD-WAN Onboarding and Cloud Integration
Lab Objective: Automate the deployment of a new Cisco cEdge branch router, attach it to the SD-WAN fabric via vManage, and establish an IPsec VPN tunnel from the cEdge (as an overlay service) to an AWS VPC provisioned with Terraform.
Lab Topology:
nwdiag {
internet [shape = cloud];
network "On-Premises Underlay" {
edge_router [label="ISP Router"];
mgmt_net [address="10.0.10.0/24", label="Management Network"];
ansible_server [address="10.0.10.50", label="Ansible/Python Host", shape=box];
vmanage_controller [address="10.0.10.60", label="Cisco vManage", shape=box];
new_branch_cedge [address="10.0.10.70", label="New Branch cEdge (Underlay Mgmt)", shape=box];
}
network "Branch LAN" {
address="10.1.0.0/24"
new_branch_cedge;
branch_server [address="10.1.0.10", label="Branch App Server", shape=box];
}
network "AWS Cloud (Provisioned)" {
aws_vpc [label="SDWAN-Cloud-VPC", shape=cloud];
aws_vpn_gw [label="AWS VGW", shape=box];
aws_private_subnet [label="AWS Private Subnet\n10.100.2.0/24"];
cloud_app_server [address="10.100.2.10", label="Cloud App Server", shape=box];
}
ansible_server -- vmanage_controller [label="API calls"];
ansible_server -- new_branch_cedge [label="SSH (Initial)"];
new_branch_cedge -- edge_router [label="Underlay (Biz-Internet)"];
edge_router -- internet;
vmanage_controller -- internet [label="Management"];
new_branch_cedge -- aws_vpn_gw [label="Overlay IPsec VPN"];
aws_vpn_gw -- aws_private_subnet;
aws_private_subnet -- cloud_app_server;
internet -- aws_vpc [label="Public Access (via IGW, not shown)"];
new_branch_cedge -- "Branch LAN";
}
Objectives:
- Provision an AWS VPC and associated VPN Gateway using Terraform.
- Use Ansible to connect to Cisco vManage.
- Automate the attachment of a pre-existing device template to the
new_branch_cedgein vManage. (Assumes cEdge is already “claimed” in vManage). - Verify IPsec tunnel establishment from cEdge to AWS VPN Gateway.
- Verify routing and connectivity between the branch LAN and AWS private subnet.
Step-by-Step Configuration:
Pre-Lab Setup:
- AWS Account: Configured with AWS CLI and
boto3credentials. - Cisco vManage: A running vManage instance reachable from your Ansible host.
- Cisco cEdge: A factory-default cEdge router (physical or virtual) with basic underlay connectivity, capable of reaching vManage, and claimed in vManage with its serial number. Ensure it has a base configuration to allow SSH/NETCONF for initial setup if not fully ZTP’d. A device template named
Branch_cEdge_Template_v1exists in vManage, configured with a service VPN (e.g., VPN 10) and an interface for the branch LAN, and an IPsec VPN feature template for cloud connectivity. - Ansible Host: Python, Ansible,
boto3,terraform, andcisco.sdwancollection installed.
Phase 1: Automate AWS VPC and VPN Gateway with Terraform
- Create
main.tfandvariables.tf: Use the AWS Terraform configuration from Section 9.3.2. Adjustaws_region,vpc_cidr,public_subnet_cidr,private_subnet_cidr, andonprem_public_ipto match your lab environment (e.g.,onprem_public_ipshould be the public IP of your cEdge’s underlay interface that terminates the VPN). - Initialize Terraform:
terraform init - Plan and Apply:
terraform plan -out tfplan.out terraform apply tfplan.out - Verify AWS Resources: Use the AWS CLI commands from Section 9.6.2 to confirm the VPC, subnets, route tables, and VPN Gateway are created. Note down the
vpn_connection_idfrom Terraform output or CLI.
Phase 2: Automate Cisco SD-WAN Template Attachment with Ansible
- Configure Ansible Inventory:
Create
inventory.ini:Ensure[vmanage_controllers] vmanage.yourdomain.com ansible_host=vmanage.yourdomain.comvmanage.yourdomain.comis reachable. - Create Ansible Playbook: Use the
sdwan_onboard_device.yamlplaybook from Section 9.4.1.- Adjust
device_ipto yournew_branch_cedge’s system IP. - Adjust
device_templateto the exact name of your pre-configured vManage device template. - Set
VMANAGE_USERNAMEandVMANAGE_PASSWORDas environment variables or use Ansible Vault.
- Adjust
- Run Ansible Playbook:
ansible-playbook sdwan_onboard_device.yaml - Verify on vManage and cEdge:
- Check vManage GUI tasks to confirm template application.
- SSH into the
new_branch_cedgeand runshow sdwan control connectionsandshow sdwan ipsec tunnels. The IPsec tunnel to the AWS VGW should come up if your vManage template correctly configured the cloud-onramp VPN.
Phase 3: Verify End-to-End Connectivity
- Deploy a Cloud App Server (optional but recommended): In your
aws_private_subnet, launch a simple EC2 instance (e.g., Amazon Linux) and ensure its Security Group allows ICMP and SSH from the branch LAN CIDR (10.1.0.0/24).- You can extend your Terraform to deploy this:
resource "aws_instance" "cloud_app_server" { ami = "ami-0abcdef1234567890" # Replace with valid AMI ID for your region instance_type = "t2.micro" subnet_id = aws_subnet.sdwan_private_subnet.id security_groups = [aws_security_group.cloud_app_sg.id] # Create a specific SG for this associate_public_ip_address = false # Private instance tags = { Name = "Cloud-App-Server" } } # Example Security Group for cloud_app_server resource "aws_security_group" "cloud_app_sg" { vpc_id = aws_vpc.sdwan_cloud_vpc.id name = "cloud-app-server-sg" ingress { from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["10.1.0.0/24"] # Allow SSH from Branch LAN } ingress { from_port = -1 to_port = -1 protocol = "icmp" cidr_blocks = ["10.1.0.0/24"] # Allow ICMP from Branch LAN } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }
- You can extend your Terraform to deploy this:
- From Branch App Server: Ping the
Cloud App Server(e.g.,ping 10.100.2.10). - From Cloud App Server: Ping the
Branch App Server(e.g.,ping 10.1.0.10).
Challenge Exercises:
- Modify the Ansible playbook to also deploy a specific BGP feature template to the cEdge for a local data center connection (if applicable).
- Extend the Python script to create an additional private subnet and configure a Network ACL (NACL) for it.
- Implement a pre-check in the Ansible playbook using
viptela_device_infoto ensure the device is indeed “up” and “reachable” before attempting template attachment. - Add a post-deployment verification step using Python to check the status of the AWS VPN connection and the SD-WAN tunnel using
boto3andcisco.sdwanmodules respectively.
9.9 Best Practices Checklist
- Infrastructure as Code (IaC): Manage all SD-WAN and cloud network configurations as code (Terraform, Ansible playbooks).
- Version Control: Store all IaC in a Git repository.
- API Security: Use strong authentication, RBAC, and secure credential management for all API interactions.
- Least Privilege: Grant automation accounts only the minimum necessary permissions.
- Modularity & Reusability: Design Ansible playbooks and Terraform modules to be modular, reusable, and easily adaptable across different environments.
- Idempotency: Ensure automation scripts can be run multiple times without causing unintended side effects.
- Automated Testing: Implement validation checks (e.g., linting, syntax checks, pre-flight checks) before deployment and post-deployment verification.
- Telemetry & Monitoring: Leverage SD-WAN and cloud monitoring capabilities to collect performance metrics and ensure network health. Automate alerts for critical events.
- Change Management: Integrate automation into a structured change management process.
- Documentation: Maintain clear documentation for all automation scripts, templates, and deployment processes.
- Network Segmentation: Implement logical segmentation using VPNs/VRFs in SD-WAN and VPCs/VNETs/Security Groups in the cloud.
- Regular Audits: Periodically audit configurations and automation scripts for security vulnerabilities and compliance.
9.10 Reference Links
SD-WAN & Automation:
- Cisco DevNet - SD-WAN: https://developer.cisco.com/sdwan/
- Cisco.SDWAN Ansible Collection: https://galaxy.ansible.com/cisco/sdwan
- YANG Model Reference: https://developer.cisco.com/site/standard-network-devices/
- RFC 6241 - NETCONF Protocol: https://www.rfc-editor.org/rfc/rfc6241
- RFC 8040 - RESTCONF Protocol: https://www.rfc-editor.org/rfc/rfc8040
- RFC 7950 - YANG 1.1: https://www.rfc-editor.org/rfc/rfc7950
Cloud Networking & IaC:
- AWS CLI Documentation: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html
- AWS boto3 Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
- Terraform AWS Provider Documentation: https://registry.terraform.io/providers/hashicorp/aws/latest/docs
- Azure CLI Documentation: https://learn.microsoft.com/en-us/cli/azure/
- Azure SDK for Python Documentation: https://learn.microsoft.com/en-us/azure/developer/python/sdk/azure-sdk-for-python
- Terraform Azure Provider Documentation: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs
Diagramming Tools:
- nwdiag: http://blockdiag.com/en/nwdiag/
- Graphviz DOT Language: https://graphviz.org/doc/info/lang.html
- PlantUML: https://plantuml.com/
- packetdiag: http://blockdiag.com/en/nwdiag/packetdiag.html
9.11 What’s Next
This chapter equipped you with the skills to automate the complex interplay between SD-WAN and cloud networking. You’ve learned to provision cloud infrastructure with Terraform, manage SD-WAN configurations via vManage APIs using Ansible, and build dynamic cloud automations with Python. The ability to treat these critical network domains as code is fundamental to modern NetDevOps practices.
In the next chapter, we will expand on these concepts by exploring Chapter 10: Advanced Network Telemetry and Analytics for NetDevOps. We will delve into collecting, analyzing, and acting upon network data using tools like streaming telemetry (gRPC, YANG-Push), ELK stack, and custom Python scripts to provide deeper insights and proactive network management.