Chapter 9: SD-WAN and Cloud Networking Automation

9.1 Introduction

The modern enterprise network extends far beyond the traditional data center, encompassing geographically dispersed branches, remote workers, and a rapidly expanding footprint in public and private clouds. Software-Defined Wide Area Networking (SD-WAN) and Cloud Networking have emerged as foundational technologies to manage this complexity, offering agility, performance, and cost optimization. However, realizing their full potential necessitates robust automation.

This chapter delves into NetDevOps strategies for automating the deployment, configuration, and day-2 operations of SD-WAN and cloud network environments. We will explore how to leverage Ansible, Python, and Infrastructure as Code (IaC) principles to streamline workflows across various vendors, with a particular focus on Cisco SD-WAN and major cloud providers like AWS and Azure.

What this chapter covers:

  • Understanding the architecture and control planes of SD-WAN and cloud networks.
  • Leveraging APIs (RESTCONF, gRPC, cloud-specific APIs) for automation.
  • Automating Cisco SD-WAN deployments using Ansible and Python.
  • Provisioning and managing cloud network infrastructure with Terraform and Python.
  • Implementing IaC for both SD-WAN and cloud components.
  • Addressing security, performance, and troubleshooting in automated SD-WAN and cloud environments.

Why it’s important:

Manual configuration of SD-WAN and cloud networks is prone to errors, slow, and cannot scale with the demands of modern business. Automation drastically reduces operational overhead, ensures configuration consistency, accelerates deployment cycles, and improves network reliability and security. Mastering these skills is essential for any network engineer operating in today’s hybrid cloud landscape.

What you’ll be able to do after reading this chapter:

  • Design and implement automation solutions for SD-WAN device onboarding and policy deployment.
  • Automate the provisioning of cloud Virtual Private Clouds (VPCs) or Virtual Networks (VNETs) and associated services.
  • Integrate SD-WAN with cloud networking environments through automated VPN tunnels and routing.
  • Apply NetDevOps principles to manage network infrastructure as code across hybrid environments.
  • Identify and mitigate common security and performance challenges in automated SD-WAN and cloud deployments.

9.2 Technical Concepts: Architectures and Automation Interfaces

9.2.1 SD-WAN Architecture and Components

SD-WAN decouples the control plane from the data plane, centralizing management and policy enforcement. Key components include:

  • Orchestration Plane: Typically a central management console (e.g., Cisco vManage, FortiManager) for design, provisioning, monitoring, and troubleshooting. It provides a graphical interface and, critically for automation, a set of APIs.
  • Control Plane: Responsible for establishing and maintaining the SD-WAN overlay topology, distributing routing information, and enforcing security policies (e.g., Cisco vSmart Controller).
  • Data Plane: Network devices (e.g., Cisco vEdge/cEdge routers, FortiGate appliances) that forward traffic based on policies received from the control plane and establish secure tunnels over the underlay network.
  • Analytics Plane: Collects telemetry data for performance monitoring, visibility, and optimization.

Underlay vs. Overlay: The underlay is the physical or logical network infrastructure (e.g., MPLS, Internet broadband, 5G) that provides connectivity. The overlay is the virtual network built on top of the underlay, typically using encrypted tunnels (e.g., IPsec, GRE) to connect sites and provide application-aware routing.

Diagram: Cisco SD-WAN Architecture

@startuml
skinparam handwritten true
skinparam shadowing false
skinparam style strictulm

cloud "Internet/MPLS Underlay" as Underlay {
}

package "SD-WAN Control Plane" {
  node "vManage NMS" as vManage {
    interface "REST API" as VMANAGE_API
  }
  node "vSmart Controller" as vSmart
  node "vBond Orchestrator" as vBond
}

package "Branch/Data Center" {
  node "vEdge/cEdge Router 1" as vEdge1
}

package "Cloud On-Ramp" {
  node "vEdge/cEdge Router N" as vEdgeN
}

vManage -[hidden]up- vBond
vManage -[hidden]up- vSmart

VMANAGE_API -down- vManage

vManage [label="vSmart : Policy push (OMP)
vSmart <"] vBond : Discovery
vEdge1 <[label="> vBond : Initial Authentication
vEdgeN <"] vBond : Initial Authentication

vEdge1 <[label="> vSmart : Control Plane (OMP)
vEdgeN <"] vSmart : Control Plane (OMP)

vEdge1 <[label="> vEdgeN : Data Plane (IPsec/DTLS)
vEdge1 <"] Underlay : Transport VPN
vEdgeN <[label="> Underlay : Transport VPN

Underlay <"] vBond : Discovery Reachability
Underlay <[label="> vSmart : Control Channel
Underlay <"] vManage : Management Channel
@enduml

9.2.2 Cloud Networking Concepts

Cloud networking provides virtualized network services within a public cloud provider’s infrastructure. Key concepts include:

  • Virtual Private Cloud (VPC) / Virtual Network (VNET): An isolated, private section of a cloud provider’s network, where users can launch resources.
  • Subnets: Divisions within a VPC/VNET, typically associated with Availability Zones for high availability.
  • Route Tables: Control traffic routing within and out of subnets.
  • Security Groups / Network Security Groups (NSG): Stateful firewalls that control inbound and outbound traffic for instances or subnets.
  • Transit Gateway / Hub-and-Spoke VNET Peering: Centralized connectivity hubs to interconnect multiple VPCs/VNETs, simplifying routing and reducing the number of point-to-point connections.
  • Direct Connect / ExpressRoute: Dedicated, private network connections from on-premises to cloud.
  • VPN Gateway: Enables encrypted connections over the public internet between on-premises networks and cloud VPCs/VNETs.

Diagram: Hybrid Cloud Networking with Transit Gateway

nwdiag {
  internet [shape = cloud];

  network "On-Premises Network" {
    address = "10.1.0.0/16"
    router1 [address = "10.1.0.1"];
    server1 [address = "10.1.1.10"];
  }

  network "AWS Region Network" {
    cloud_router [label = "AWS Transit Gateway"];
    network "VPC A" {
      address = "10.10.0.0/16"
      web_tier_a [address = "10.10.1.10"];
      app_tier_a [address = "10.10.2.10"];
    }
    network "VPC B" {
      address = "10.20.0.0/16"
      web_tier_b [address = "10.20.1.10"];
      db_tier_b [address = "10.20.3.10", shape = database];
    }
  }

  internet -- router1;
  router1 -- "On-Premises Network";

  router1 -- cloud_router [label = "IPsec VPN"];

  cloud_router -- "VPC A";
  cloud_router -- "VPC B";

  web_tier_a -- app_tier_a;
  web_tier_b -- db_tier_b;
}

9.2.3 Control Plane vs. Data Plane Automation

  • Control Plane Automation: Focuses on configuring the orchestrators and controllers. This often involves interacting with REST APIs (e.g., vManage REST API, AWS EC2 API, Azure Resource Manager API) to define network policies, provision virtual devices, manage users, and orchestrate services. Data models like YANG are crucial here for structured configuration.
    • RFC References: RFC 7950 (YANG 1.1), RFC 8072 (YANG-Push), RFC 6241 (NETCONF), RFC 8040 (RESTCONF).
  • Data Plane Automation: In traditional networks, this means configuring individual devices (routers, switches) via CLI, NETCONF, or RESTCONF. In SD-WAN, the data plane devices (vEdges/cEdges) often receive their configuration and policies from the control plane. Automation here might involve pushing specific localized configs, troubleshooting, or collecting telemetry. In cloud, it’s about provisioning virtual appliances (e.g., firewalls) and configuring their interfaces and routing.

9.2.4 API-Driven Management: NETCONF, RESTCONF, gRPC, and Cloud APIs

Modern network devices and controllers increasingly expose programmatic interfaces.

  • NETCONF (Network Configuration Protocol): An XML-based protocol designed for managing network devices. It provides mechanisms to install, manipulate, and delete configuration data. It uses YANG models to define the structure of configuration and state data.
    • RFC: RFC 6241 (NETCONF Protocol), RFC 6242 (Using NETCONF over SSH).
  • RESTCONF: A REST-like API that operates over HTTP(S), providing a simpler, stateless interface to access YANG-modeled data. It’s often preferred for web-based applications and general automation due to its ubiquity and simplicity.
    • RFC: RFC 8040 (RESTCONF Protocol).
  • gRPC (gRPC Remote Procedure Call): A high-performance, open-source RPC framework that can use Protocol Buffers for structured data serialization. It’s gaining traction for high-volume telemetry and low-latency control plane interactions, often with YANG-based data models.
    • Key Benefit: Bidirectional streaming, efficiency.
  • Cloud-Specific APIs: Each major cloud provider (AWS, Azure, GCP) offers its own comprehensive set of RESTful APIs to manage all aspects of their infrastructure. These are typically accessed via SDKs (e.g., boto3 for AWS, Azure SDK for Python) or CLI tools.

Diagram: Automation Interactions with SD-WAN and Cloud APIs

digraph automation_apis {
    rankdir=LR;
    node [shape=box];

    subgraph cluster_automation {
        label = "Automation Platform";
        bgcolor = lightblue;
        Python_Scripts [label="Python Scripts"];
        Ansible_Playbooks [label="Ansible Playbooks"];
        Terraform_Config [label="Terraform Config"];
    }

    subgraph cluster_sdwan {
        label = "SD-WAN Controller (e.g., Cisco vManage)";
        bgcolor = lightgreen;
        vManage_API [label="vManage RESTCONF API"];
        vSmart_NETCONF [label="vSmart NETCONF/gRPC"];
        cEdge_NETCONF [label="cEdge NETCONF/RESTCONF"];
    }

    subgraph cluster_cloud {
        label = "Cloud Provider (e.g., AWS)";
        bgcolor = lightcoral;
        AWS_API [label="AWS EC2/VPC API"];
        AWS_SDK [label="AWS SDK (e.g., boto3)"];
    }

    Python_Scripts -> vManage_API [label="HTTPS/REST"];
    Ansible_Playbooks -> vManage_API [label="HTTPS/REST"];
    Ansible_Playbooks -> cEdge_NETCONF [label="SSH/NETCONF"];
    Python_Scripts -> AWS_SDK [label="SDK Calls"];
    Terraform_Config -> AWS_API [label="Provider API"];

    vManage_API -> vSmart_NETCONF [label="Orchestration"];
    vSmart_NETCONF -> cEdge_NETCONF [label="Policy Push"];
    AWS_SDK -> AWS_API [label="Internal API Call"];
}

9.2.5 IPsec Tunnel Header Structure (Simplified)

SD-WAN heavily relies on IPsec for secure data plane connectivity. Understanding its basic structure is helpful for troubleshooting.

packetdiag {
  colwidth = 32
  0-31: SPI (Security Parameter Index)
  32-63: Sequence Number
  64-95: IPsec Payload (e.g., ESP Header, Inner IP Packet)
  ...
  variable: Authentication Data (if used)
}

9.3 Configuration Examples (Multi-vendor)

9.3.1 Cisco SD-WAN (cEdge) - Manual Configuration for Verification

While the goal is automation, understanding the underlying device configuration provides context. In Cisco SD-WAN, vManage uses templates to push configurations. Here, we show a simplified cEdge configuration for an IPsec tunnel that would be managed by vManage, along with verification commands. Note: Direct CLI configuration on a cEdge usually means it’s running in autonomous mode or for specific local-only settings not controlled by vManage templates. The following is illustrative of the outcome of a vManage template.

! cEdge Router Configuration (Illustrative - primarily managed by vManage templates)

! Define a VPN interface for transport
interface GigabitEthernet0/0/0.100
 encapsulation dot1Q 100
 ip dhcp client
 negotiation auto
 no shut
 vpn 0 ! Transport VPN

! Interface in Service VPN for LAN segment
interface GigabitEthernet0/0/1
 ip address 10.1.1.1 255.255.255.0
 no shut
 vpn 10 ! Service VPN

! Overlay Management Protocol (OMP)
omp
  no shutdown
  graceful-restart
  advertise networks
  advertise connected
  advertise static
  advertise bgp
  no shutdown

! System Global Configuration
system
  system-ip 192.0.2.10
  site-id 100
  hostname cEdge-Branch1
  vpn 0
    interface GigabitEthernet0/0/0.100
      tunnel-interface
        encapsulation ipsec
        color biz-internet
        ! ipsec properties would be configured by vManage based on templates

Verification Commands on cEdge:

show sdwan control connections
show sdwan omp peers
show sdwan omp routes
show sdwan ipsec tunnels
show ip route vpn 10

Expected Output (Snippet):

cEdge-Branch1# show sdwan control connections
                                                                                        PEER      PEER
                                                                                        PRIVATE   PRIVATE
VPN    TYPE         PEER IP        PEER ID  SITE ID  DOMAIN ID  STATE      UPTIME      PORT      PUBLIC IP  PUBLIC PORT  LOCAL COLOR    REMOTE COLOR     VSMART-REDUNDANCY-GROUP
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0      vsmart       192.0.2.1      10.0.0.1 1         1         up         0:00:15:30  12346     192.0.2.1   12346        biz-internet   default          -

cEdge-Branch1# show sdwan omp peers
                                                       PEER    PEER             PEER         REFRESH        TIME
PEER TYPE  PEER ID        SITE ID  DOMAIN ID  OVERLOAD STATE  UPTIME           CONTROL      (SECONDS)
----------------------------------------------------------------------------------------------------------
vsmart     10.0.0.1       1        1          No      up       0:00:15:30      0          530

9.3.2 AWS Cloud Networking - VPC with VPN Gateway

This example uses Terraform to provision an AWS VPC, subnets, internet gateway, route tables, and a Customer Gateway for an IPsec VPN connection.

# main.tf for AWS VPC and VPN Gateway
# Requires AWS provider configured elsewhere (e.g., providers.tf or environment vars)

variable "aws_region" {
  description = "AWS region for deployment"
  type        = string
  default     = "us-east-1"
}

variable "vpc_cidr" {
  description = "CIDR block for the VPC"
  type        = string
  default     = "10.100.0.0/16"
}

variable "public_subnet_cidr" {
  description = "CIDR block for the public subnet"
  type        = string
  default     = "10.100.1.0/24"
}

variable "private_subnet_cidr" {
  description = "CIDR block for the private subnet"
  type        = string
  default     = "10.100.2.0/24"
}

variable "onprem_public_ip" {
  description = "Public IP of the on-premises VPN device"
  type        = string
  # SECURITY WARNING: Replace with actual IP, do not expose sensitive IPs in production IaC
  default     = "203.0.113.5" # Example IP
}

# 1. Create VPC
resource "aws_vpc" "sdwan_cloud_vpc" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = {
    Name = "SDWAN-Cloud-VPC"
  }
}

# 2. Create Internet Gateway
resource "aws_internet_gateway" "sdwan_igw" {
  vpc_id = aws_vpc.sdwan_cloud_vpc.id
  tags = {
    Name = "SDWAN-Cloud-IGW"
  }
}

# 3. Create Public Subnet
resource "aws_subnet" "sdwan_public_subnet" {
  vpc_id                  = aws_vpc.sdwan_cloud_vpc.id
  cidr_block              = var.public_subnet_cidr
  availability_zone       = "${var.aws_region}a" # Use first AZ
  map_public_ip_on_launch = true # Instances in this subnet get public IPs
  tags = {
    Name = "SDWAN-Cloud-Public-Subnet"
  }
}

# 4. Create Private Subnet
resource "aws_subnet" "sdwan_private_subnet" {
  vpc_id            = aws_vpc.sdwan_cloud_vpc.id
  cidr_block        = var.private_subnet_cidr
  availability_zone = "${var.aws_region}a" # Use first AZ
  tags = {
    Name = "SDWAN-Cloud-Private-Subnet"
  }
}

# 5. Create Public Route Table
resource "aws_route_table" "sdwan_public_rt" {
  vpc_id = aws_vpc.sdwan_cloud_vpc.id
  tags = {
    Name = "SDWAN-Cloud-Public-RT"
  }
}

# 6. Route for Internet Gateway
resource "aws_route" "sdwan_public_internet_route" {
  route_table_id         = aws_route_table.sdwan_public_rt.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.sdwan_igw.id
}

# 7. Associate Public Subnet with Public Route Table
resource "aws_route_table_association" "sdwan_public_rt_assoc" {
  subnet_id      = aws_subnet.sdwan_public_subnet.id
  route_table_id = aws_route_table.sdwan_public_rt.id
}

# 8. Create Private Route Table (for VPN connectivity)
resource "aws_route_table" "sdwan_private_rt" {
  vpc_id = aws_vpc.sdwan_cloud_vpc.id
  tags = {
    Name = "SDWAN-Cloud-Private-RT"
  }
}

# 9. Associate Private Subnet with Private Route Table
resource "aws_route_table_association" "sdwan_private_rt_assoc" {
  subnet_id      = aws_subnet.sdwan_private_subnet.id
  route_table_id = aws_route_table.sdwan_private_rt.id
}

# 10. Create Customer Gateway (On-Premises VPN device representation)
resource "aws_customer_gateway" "sdwan_cgw" {
  bgp_asn    = "65000" # Example ASN, match on-premises
  ip_address = var.onprem_public_ip
  type       = "ipsec.1"
  tags = {
    Name = "SDWAN-Branch-CGW"
  }
}

# 11. Create Virtual Private Gateway (Cloud VPN endpoint)
resource "aws_vpn_gateway" "sdwan_vgw" {
  vpc_id = aws_vpc.sdwan_cloud_vpc.id
  tags = {
    Name = "SDWAN-Cloud-VGW"
  }
}

# 12. Create VPN Connection (IPsec tunnel)
resource "aws_vpn_connection" "sdwan_vpn" {
  vpn_gateway_id      = aws_vpn_gateway.sdwan_vgw.id
  customer_gateway_id = aws_customer_gateway.sdwan_cgw.id
  type                = "ipsec.1"
  static_routes_only  = false # Use dynamic routing (BGP)
  tags = {
    Name = "SDWAN-Branch-to-Cloud-VPN"
  }
}

# 13. Propagate routes from VPN Gateway to Private Route Table
resource "aws_vpn_gateway_route_propagation" "sdwan_vgw_route_prop" {
  vpn_gateway_id = aws_vpn_gateway.sdwan_vgw.id
  route_table_id = aws_route_table.sdwan_private_rt.id
}

output "vpc_id" {
  description = "The ID of the created VPC"
  value       = aws_vpc.sdwan_cloud_vpc.id
}

output "public_subnet_id" {
  description = "The ID of the created public subnet"
  value       = aws_subnet.sdwan_public_subnet.id
}

output "private_subnet_id" {
  description = "The ID of the created private subnet"
  value       = aws_subnet.sdwan_private_subnet.id
}

output "vpn_connection_id" {
  description = "The ID of the created VPN connection"
  value       = aws_vpn_connection.sdwan_vpn.id
}

Verification Commands (AWS CLI):

aws ec2 describe-vpcs --filters "Name=tag:Name,Values=SDWAN-Cloud-VPC"
aws ec2 describe-subnets --filters "Name=tag:Name,Values=SDWAN-Cloud-Public-Subnet"
aws ec2 describe-vpn-connections --filters "Name=tag:Name,Values=SDWAN-Branch-to-Cloud-VPN"
aws ec2 describe-route-tables --filters "Name=tag:Name,Values=SDWAN-Cloud-Private-RT" --query 'RouteTables[*].Routes'

Expected Output (Snippet):

{
    "Vpcs": [
        {
            "CidrBlock": "10.100.0.0/16",
            "DhcpOptionsId": "dopt-xxxxxxxxxxxxxxxxx",
            "State": "available",
            "VpcId": "vpc-xxxxxxxxxxxxxxxxx",
            "OwnerId": "xxxxxxxxxxxx",
            "InstanceTenancy": "default",
            "Ipv6CidrBlockAssociationSet": [],
            "IsDefault": false,
            "Tags": [
                {
                    "Key": "Name",
                    "Value": "SDWAN-Cloud-VPC"
                }
            ],
            "OwnerCidrBlockAssociationSet": [
                {
                    "CidrBlock": "10.100.0.0/16",
                    "AssociationId": "vpc-cidr-assoc-xxxxxxxxxxxxxxxxx",
                    "CidrBlockState": {
                        "State": "associated"
                    }
                }
            ]
        }
    ]
}

9.4 Automation Examples

9.4.1 Automating Cisco SD-WAN with Ansible

This Ansible playbook demonstrates how to use the cisco.sdwan.viptela collection to interact with a vManage controller. The playbook applies a device template to a specific vEdge/cEdge device.

Prerequisites:

  • Ansible cisco.sdwan collection installed.
  • vManage API credentials (username/password or token) configured securely (e.g., as Ansible vault variables or environment variables).
  • Inventory file with vManage details.
# sdwan_onboard_device.yaml
---
- name: Automate Cisco SD-WAN Device Onboarding and Template Attachment
  hosts: vmanage_controllers
  connection: local
  gather_facts: no
  vars:
    vmanage_host: ""
    vmanage_port: 8443
    vmanage_username: "" # Or from Ansible Vault
    vmanage_password: "" # Or from Ansible Vault
    # Device details for the new branch cEdge
    device_name: "cEdge-Branch-X"
    device_ip: "192.0.2.100" # System IP of the cEdge
    device_template: "Branch_cEdge_Template_v1" # Name of the device template on vManage

  tasks:
    - name: Ensure device is onboarded (if not already)
      # In a real scenario, this might involve ztp_claim or similar.
      # For this example, we assume the device is already provisioned/claimed
      # and we are attaching a template.
      # This task is a placeholder to represent a prior onboarding step.
      # Or you might fetch device UUID if it's already there
      debug:
        msg: "Assuming  with System IP  is already claimed in vManage."

    - name: Fetch device UUID for the target cEdge
      cisco.sdwan.viptela_device_info:
        vmanage_host: ""
        vmanage_port: ""
        vmanage_username: ""
        vmanage_password: ""
        device_system_ip: ""
      register: device_info_result

    - name: Set device UUID fact
      set_fact:
        device_uuid: ""
      when: device_info_result.data | length > 0

    - name: Get template ID by name
      cisco.sdwan.viptela_template_info:
        vmanage_host: ""
        vmanage_port: ""
        vmanage_username: ""
        vmanage_password: ""
        template_name: ""
        template_type: "device" # Important to specify device template type
      register: template_info_result

    - name: Set template ID fact
      set_fact:
        template_id: ""
      when: template_info_result.data | length > 0

    - name: Attach device template to the cEdge
      cisco.sdwan.viptela_template_attach:
        vmanage_host: ""
        vmanage_port: ""
        vmanage_username: ""
        vmanage_password: ""
        device_template:
          name: ""
          id: ""
        devices:
          - uuid: ""
            system_ip: ""
        apply_activate: yes # Push and activate the configuration
      when: device_uuid is defined and template_id is defined

9.4.2 Automating Cloud Network Provisioning with Python (AWS boto3)

This Python script uses boto3 to create a new VPC, a public subnet, and an Internet Gateway in AWS. This is a common starting point for a cloud network on-ramp.

# aws_provision_vpc.py
import boto3
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def create_aws_vpc_with_igw_and_subnet(region, vpc_cidr, public_subnet_cidr, tags):
    """
    Creates an AWS VPC, an Internet Gateway, and a public subnet.
    Args:
        region (str): AWS region (e.g., 'us-east-1').
        vpc_cidr (str): CIDR block for the VPC (e.g., '10.200.0.0/16').
        public_subnet_cidr (str): CIDR block for the public subnet (e.g., '10.200.1.0/24').
        tags (dict): Dictionary of tags to apply to created resources.
    Returns:
        dict: A dictionary containing IDs of created resources, or None on failure.
    """
    ec2 = boto3.client('ec2', region_name=region)
    resource_ids = {}

    try:
        logging.info(f"Creating VPC with CIDR: {vpc_cidr} in region: {region}")
        vpc_response = ec2.create_vpc(CidrBlock=vpc_cidr, TagSpecifications=[
            {'ResourceType': 'vpc', 'Tags': [{'Key': k, 'Value': v} for k, v in tags.items()]}
        ])
        vpc_id = vpc_response['Vpc']['VpcId']
        resource_ids['vpc_id'] = vpc_id
        logging.info(f"VPC {vpc_id} created.")

        # Enable DNS hostnames and support for the VPC
        ec2.modify_vpc_attribute(VpcId=vpc_id, EnableDnsSupport={'Value': True})
        ec2.modify_vpc_attribute(VpcId=vpc_id, EnableDnsHostnames={'Value': True})
        logging.info(f"Enabled DNS support and hostnames for VPC {vpc_id}.")

        logging.info("Creating Internet Gateway.")
        igw_response = ec2.create_internet_gateway(TagSpecifications=[
            {'ResourceType': 'internet-gateway', 'Tags': [{'Key': k, 'Value': v} for k, v in tags.items()]}
        ])
        igw_id = igw_response['InternetGateway']['InternetGatewayId']
        resource_ids['igw_id'] = igw_id
        logging.info(f"Internet Gateway {igw_id} created.")

        logging.info(f"Attaching Internet Gateway {igw_id} to VPC {vpc_id}.")
        ec2.attach_internet_gateway(InternetGatewayId=igw_id, VpcId=vpc_id)
        logging.info(f"Internet Gateway {igw_id} attached to VPC {vpc_id}.")

        logging.info(f"Creating public subnet with CIDR: {public_subnet_cidr}.")
        subnet_response = ec2.create_subnet(
            VpcId=vpc_id,
            CidrBlock=public_subnet_cidr,
            AvailabilityZone=f"{region}a", # Using the first AZ for simplicity
            TagSpecifications=[
                {'ResourceType': 'subnet', 'Tags': [{'Key': k, 'Value': v} for k, v in tags.items()]}
            ]
        )
        public_subnet_id = subnet_response['Subnet']['SubnetId']
        resource_ids['public_subnet_id'] = public_subnet_id
        logging.info(f"Public subnet {public_subnet_id} created.")

        logging.info(f"Creating route table for public subnet {public_subnet_id}.")
        route_table_response = ec2.create_route_table(
            VpcId=vpc_id,
            TagSpecifications=[
                {'ResourceType': 'route-table', 'Tags': [{'Key': k, 'Value': v} for k, v in tags.items()]}
            ]
        )
        public_route_table_id = route_table_response['RouteTable']['RouteTableId']
        resource_ids['public_route_table_id'] = public_route_table_id
        logging.info(f"Public route table {public_route_table_id} created.")

        logging.info(f"Adding default route to Internet Gateway {igw_id} in route table {public_route_table_id}.")
        ec2.create_route(
            RouteTableId=public_route_table_id,
            DestinationCidrBlock='0.0.0.0/0',
            GatewayId=igw_id
        )
        logging.info("Default route added.")

        logging.info(f"Associating public subnet {public_subnet_id} with route table {public_route_table_id}.")
        ec2.associate_route_table(
            SubnetId=public_subnet_id,
            RouteTableId=public_route_table_id
        )
        logging.info("Subnet-route table association complete.")

        ec2.modify_subnet_attribute(SubnetId=public_subnet_id, MapPublicIpOnLaunch={'Value': True})
        logging.info(f"Enabled auto-assign public IP on launch for subnet {public_subnet_id}.")

        return resource_ids

    except Exception as e:
        logging.error(f"Error during AWS resource creation: {e}")
        return None

if __name__ == "__main__":
    aws_region = "us-east-1" # Or get from environment/config
    my_vpc_cidr = "10.200.0.0/16"
    my_public_subnet_cidr = "10.200.1.0/24"
    resource_tags = {"Project": "NetDevOps-SDWAN", "Environment": "Dev", "ManagedBy": "Python"}

    created_resources = create_aws_vpc_with_igw_and_subnet(
        aws_region, my_vpc_cidr, my_public_subnet_cidr, resource_tags
    )

    if created_resources:
        logging.info("Successfully provisioned AWS resources:")
        for k, v in created_resources.items():
            logging.info(f"  {k}: {v}")
    else:
        logging.error("Failed to provision AWS resources.")

9.5 Security Considerations

Automating SD-WAN and cloud networking introduces new attack vectors and necessitates careful security planning.

  • API Security:
    • Authentication & Authorization: Use strong authentication methods (e.g., OAuth2, API tokens) for API access. Implement Role-Based Access Control (RBAC) to grant minimum necessary privileges to automation accounts.
    • Secure Communication: Always use HTTPS/TLS for all API interactions.
    • Credential Management: Store API keys, tokens, and passwords securely using tools like Ansible Vault, HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault. Never hardcode credentials.
  • SD-WAN Specific Security:
    • Controller Hardening: Secure access to vManage, vSmart, and vBond controllers. Apply security patches promptly.
    • IPsec Encryption: Ensure strong IPsec policies with robust encryption and authentication algorithms (e.g., AES-256, SHA-256) for overlay tunnels (RFC 4301, RFC 4303).
    • Segmentation: Leverage SD-WAN’s ability to create VPNs/VRFs to segment traffic, ensuring sensitive data travels only over authorized paths.
    • Zero Trust Network Access (ZTNA): SD-WAN can integrate with ZTNA solutions to apply granular access policies based on user identity and device posture.
  • Cloud Networking Security:
    • Infrastructure as Code Security Scanning: Use tools like Checkov, Kics, or Terrascan to scan Terraform/CloudFormation templates for security misconfigurations before deployment.
    • Security Groups/NSGs: Implement least-privilege principles. Only open ports/protocols necessary for specific applications. Automate their management via IaC.
    • Network Segmentation: Use VPCs/VNETs and subnets to logically segment different applications or environments. Use Transit Gateways with routing policies to control inter-VPC traffic.
    • VPN/Direct Connect Security: Ensure strong encryption and authentication for hybrid connectivity. Regularly audit VPN configurations.
    • Monitoring and Logging: Implement centralized logging and monitoring (e.g., AWS CloudTrail, VPC Flow Logs, Azure Monitor) to detect anomalous activity. Automate alerts for security events.
  • Supply Chain Security: Secure your automation pipeline. Ensure that Ansible playbooks, Python scripts, and Terraform configurations are stored in secure version control, scanned for vulnerabilities, and deployed through trusted CI/CD processes.

Security Configuration Example (AWS Security Group via Terraform):

# Security Group for a Web Server in the Cloud VPC
resource "aws_security_group" "web_sg" {
  vpc_id = aws_vpc.sdwan_cloud_vpc.id
  name   = "web-server-sg"
  description = "Allow inbound HTTP/S and SSH from specific ranges"

  # Inbound rules
  ingress {
    description = "Allow HTTP from anywhere"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"] # SECURITY WARNING: For production, narrow this down!
  }

  ingress {
    description = "Allow HTTPS from anywhere"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"] # SECURITY WARNING: For production, narrow this down!
  }

  ingress {
    description = "Allow SSH from On-Premises Network"
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    # SECURITY BEST PRACTICE: Replace with your actual on-premises network CIDR.
    # Never expose SSH to 0.0.0.0/0 in production.
    cidr_blocks = ["10.1.0.0/16"]
  }

  # Outbound rules
  egress {
    description = "Allow all outbound traffic"
    from_port   = 0
    to_port     = 0
    protocol    = "-1" # -1 means all protocols
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "Web-Server-SG"
  }
}

9.6 Verification & Troubleshooting

Automated deployments require automated verification and systematic troubleshooting.

9.6.1 SD-WAN Verification & Troubleshooting

Verification Commands (Cisco cEdge): Once templates are applied via vManage, verify connectivity and policy.

show sdwan control connections      ! Verify control plane connections to vSmart/vBond
show sdwan omp peers               ! Verify OMP adjacency with vSmart
show sdwan omp routes              ! Verify routes learned via OMP
show sdwan ipsec tunnels           ! Verify IPsec data plane tunnels are up
show policy from-vsmart            ! Review applied policies
show interface tunnel              ! Check tunnel interface status
ping vpn 10 10.10.1.10 source-interface GigabitEthernet0/0/1 ! Ping a cloud resource from service VPN

Common SD-WAN Issues:

IssuePotential Root CauseResolution StepsDebug Commands
Device onboarding failureIncorrect vBond IP, firewall blocking ports, invalid serial/tokenVerify vBond reachability from cEdge. Check firewall rules (ports 12346 for DTLS/TLS, 9300 for BFD). Ensure correct serial number and token are used.show control connections, show control connection-history, debug sdwan control-plane
Template attachment failureTemplate errors, device not reachable by vManage, device not in proper stateReview vManage task log for detailed errors. Ensure device is online and manageable by vManage. Validate template syntax.vManage GUI: “Monitor -> Tasks”, “Configuration -> Devices -> Validation”
IPsec tunnel not coming upFirewall issues, NAT traversal problems, incorrect IPsec/DTLS parameters, routing issuesCheck reachability to peer (vEdge/cEdge or Cloud VPN Gateway). Verify NAT configuration. Ensure consistent IPsec parameters. Check routing to remote endpoint.show sdwan ipsec tunnels, show crypto isakmp sa, show crypto ipsec sa, debug crypto isakmp
OMP routes not advertised/learnedOMP disabled, filter policies, VPN membership issuesEnsure omp no shutdown is configured. Check OMP advertisements (omp advertise networks). Review OMP export/import policies. Verify interface vpn membership.show sdwan omp peers, show sdwan omp routes, show sdwan policy
Application performance issuesUnderlay congestion, incorrect application-aware routing policy, QoS misconfigurationMonitor underlay network health. Verify application-aware routing policies are directing traffic correctly. Check QoS settings on devices and vManage. Look for packet loss/latency.show sdwan app-route statistics, show policy service-class, monitor traffic

9.6.2 Cloud Networking Verification & Troubleshooting

Verification Commands (AWS CLI):

aws ec2 describe-vpcs --vpc-ids vpc-xxxxxxxxxxxxxxxxx # Verify VPC details
aws ec2 describe-subnets --filters "Name=vpc-id,Values=vpc-xxxxxxxxxxxxxxxxx" # Verify subnets
aws ec2 describe-route-tables --filters "Name=vpc-id,Values=vpc-xxxxxxxxxxxxxxxxx" # Verify route tables and associations
aws ec2 describe-network-acls --filters "Name=vpc-id,Values=vpc-xxxxxxxxxxxxxxxxx" # Review NACLs (if used)
aws ec2 describe-security-groups --filters "Name=vpc-id,Values=vpc-xxxxxxxxxxxxxxxxx" # Verify Security Groups
aws ec2 describe-vpn-connections --vpn-connection-ids vpn-xxxxxxxxxxxxxxxxx # Verify VPN connection status

Common Cloud Networking Issues:

IssuePotential Root CauseResolution StepsDebug Methods
VPC/Subnet creation failureInvalid CIDR, region limits, IAM permissionsReview Terraform apply logs or Python script output. Check AWS Service Quotas. Verify IAM policy allows ec2:CreateVpc, ec2:CreateSubnet, etc.CloudTrail logs, Terraform debug output, Python logging
Internet access from public subnet failsMissing/incorrect Internet Gateway, route table misconfiguration, Security Group/NACL blockingVerify IGW is attached to VPC. Ensure public route table has a default route (0.0.0.0/0) pointing to IGW. Check Security Group egress rules and NACL rules.aws ec2 describe-route-tables, aws ec2 describe-security-groups, tcpdump on instance
VPN tunnel to on-premise downMismatched IPsec parameters, incorrect Customer Gateway IP, routing issues, firewall on-premiseVerify IPsec configuration (encryption, authentication, PFS, DPD) on both sides. Ensure Customer Gateway IP is correct public IP of on-premises device. Check on-premises firewall rules.aws ec2 describe-vpn-connections, VPC Flow Logs, on-premises device logs (show crypto isakmp sa)
Inter-VPC traffic blockedMissing Transit Gateway attachment/route, Security Group/NACL blockingEnsure VPCs are attached to Transit Gateway. Verify TGW route tables have correct routes. Check Security Groups and NACLs.aws ec2 describe-transit-gateway-attachments, aws ec2 search-transit-gateway-routes, VPC Flow Logs
Instance cannot reach resourceSecurity Group/NACL, instance route table, DNSVerify Security Group/NACL rules. Check instance’s effective route table. Confirm DNS resolution (e.g., if connecting to a private endpoint).ping, traceroute from instance, VPC Flow Logs, aws ec2 get-console-output

9.7 Performance Optimization

Optimizing performance in SD-WAN and cloud networks involves fine-tuning policies, ensuring adequate capacity, and continuous monitoring.

  • SD-WAN Performance Optimization:
    • Application-Aware Routing: Configure policies to dynamically steer critical application traffic over the best performing paths (e.g., low latency, low jitter, high bandwidth links). Use path monitoring to detect degraded links and automatically failover.
    • QoS (Quality of Service): Implement QoS policies on vEdges/cEdges to prioritize business-critical traffic over less important traffic. Map applications to specific QoS classes.
    • Traffic Shaping/Policing: Control bandwidth usage to prevent any single application or user from monopolizing resources.
    • Link Aggregation/Load Balancing: Utilize multiple underlay links to increase aggregate bandwidth and provide redundancy.
    • Controller Scaling: Ensure vManage, vSmart, and vBond controllers are adequately resourced (CPU, memory, storage) and scaled horizontally to handle the number of managed devices and traffic volume.
  • Cloud Networking Performance Optimization:
    • Direct Connect/ExpressRoute: For predictable, high-bandwidth, low-latency connectivity to on-premises, dedicated private connections are superior to IPsec VPN over the internet.
    • Transit Gateway Placement: Optimize TGW placement to minimize inter-region or inter-AZ latency.
    • VPC Peering vs. Transit Gateway: Understand the trade-offs. Peering is point-to-point; TGW provides hub-and-spoke and simplifies routing for many VPCs.
    • Network Acceleration Services: Cloud providers offer services (e.g., AWS Global Accelerator, Azure Front Door) to improve application performance over the internet by routing traffic through their optimized global networks.
    • Instance Network Performance: Choose appropriate instance types with sufficient network bandwidth and optimized networking drivers.
    • Monitoring and Alerts: Continuously monitor key performance metrics (latency, packet loss, bandwidth utilization) for both the SD-WAN overlay and cloud network segments. Automate alerts for performance degradation.

9.8 Hands-On Lab: Automated Branch SD-WAN Onboarding and Cloud Integration

Lab Objective: Automate the deployment of a new Cisco cEdge branch router, attach it to the SD-WAN fabric via vManage, and establish an IPsec VPN tunnel from the cEdge (as an overlay service) to an AWS VPC provisioned with Terraform.

Lab Topology:

nwdiag {
  internet [shape = cloud];

  network "On-Premises Underlay" {
    edge_router [label="ISP Router"];
    mgmt_net [address="10.0.10.0/24", label="Management Network"];
    ansible_server [address="10.0.10.50", label="Ansible/Python Host", shape=box];
    vmanage_controller [address="10.0.10.60", label="Cisco vManage", shape=box];
    new_branch_cedge [address="10.0.10.70", label="New Branch cEdge (Underlay Mgmt)", shape=box];
  }

  network "Branch LAN" {
    address="10.1.0.0/24"
    new_branch_cedge;
    branch_server [address="10.1.0.10", label="Branch App Server", shape=box];
  }

  network "AWS Cloud (Provisioned)" {
    aws_vpc [label="SDWAN-Cloud-VPC", shape=cloud];
    aws_vpn_gw [label="AWS VGW", shape=box];
    aws_private_subnet [label="AWS Private Subnet\n10.100.2.0/24"];
    cloud_app_server [address="10.100.2.10", label="Cloud App Server", shape=box];
  }

  ansible_server -- vmanage_controller [label="API calls"];
  ansible_server -- new_branch_cedge [label="SSH (Initial)"];

  new_branch_cedge -- edge_router [label="Underlay (Biz-Internet)"];
  edge_router -- internet;

  vmanage_controller -- internet [label="Management"];

  new_branch_cedge -- aws_vpn_gw [label="Overlay IPsec VPN"];
  aws_vpn_gw -- aws_private_subnet;
  aws_private_subnet -- cloud_app_server;

  internet -- aws_vpc [label="Public Access (via IGW, not shown)"];
  new_branch_cedge -- "Branch LAN";
}

Objectives:

  1. Provision an AWS VPC and associated VPN Gateway using Terraform.
  2. Use Ansible to connect to Cisco vManage.
  3. Automate the attachment of a pre-existing device template to the new_branch_cedge in vManage. (Assumes cEdge is already “claimed” in vManage).
  4. Verify IPsec tunnel establishment from cEdge to AWS VPN Gateway.
  5. Verify routing and connectivity between the branch LAN and AWS private subnet.

Step-by-Step Configuration:

Pre-Lab Setup:

  • AWS Account: Configured with AWS CLI and boto3 credentials.
  • Cisco vManage: A running vManage instance reachable from your Ansible host.
  • Cisco cEdge: A factory-default cEdge router (physical or virtual) with basic underlay connectivity, capable of reaching vManage, and claimed in vManage with its serial number. Ensure it has a base configuration to allow SSH/NETCONF for initial setup if not fully ZTP’d. A device template named Branch_cEdge_Template_v1 exists in vManage, configured with a service VPN (e.g., VPN 10) and an interface for the branch LAN, and an IPsec VPN feature template for cloud connectivity.
  • Ansible Host: Python, Ansible, boto3, terraform, and cisco.sdwan collection installed.

Phase 1: Automate AWS VPC and VPN Gateway with Terraform

  1. Create main.tf and variables.tf: Use the AWS Terraform configuration from Section 9.3.2. Adjust aws_region, vpc_cidr, public_subnet_cidr, private_subnet_cidr, and onprem_public_ip to match your lab environment (e.g., onprem_public_ip should be the public IP of your cEdge’s underlay interface that terminates the VPN).
  2. Initialize Terraform:
    terraform init
    
  3. Plan and Apply:
    terraform plan -out tfplan.out
    terraform apply tfplan.out
    
  4. Verify AWS Resources: Use the AWS CLI commands from Section 9.6.2 to confirm the VPC, subnets, route tables, and VPN Gateway are created. Note down the vpn_connection_id from Terraform output or CLI.

Phase 2: Automate Cisco SD-WAN Template Attachment with Ansible

  1. Configure Ansible Inventory: Create inventory.ini:
    [vmanage_controllers]
    vmanage.yourdomain.com ansible_host=vmanage.yourdomain.com
    
    Ensure vmanage.yourdomain.com is reachable.
  2. Create Ansible Playbook: Use the sdwan_onboard_device.yaml playbook from Section 9.4.1.
    • Adjust device_ip to your new_branch_cedge’s system IP.
    • Adjust device_template to the exact name of your pre-configured vManage device template.
    • Set VMANAGE_USERNAME and VMANAGE_PASSWORD as environment variables or use Ansible Vault.
  3. Run Ansible Playbook:
    ansible-playbook sdwan_onboard_device.yaml
    
  4. Verify on vManage and cEdge:
    • Check vManage GUI tasks to confirm template application.
    • SSH into the new_branch_cedge and run show sdwan control connections and show sdwan ipsec tunnels. The IPsec tunnel to the AWS VGW should come up if your vManage template correctly configured the cloud-onramp VPN.

Phase 3: Verify End-to-End Connectivity

  1. Deploy a Cloud App Server (optional but recommended): In your aws_private_subnet, launch a simple EC2 instance (e.g., Amazon Linux) and ensure its Security Group allows ICMP and SSH from the branch LAN CIDR (10.1.0.0/24).
    • You can extend your Terraform to deploy this:
      resource "aws_instance" "cloud_app_server" {
        ami           = "ami-0abcdef1234567890" # Replace with valid AMI ID for your region
        instance_type = "t2.micro"
        subnet_id     = aws_subnet.sdwan_private_subnet.id
        security_groups = [aws_security_group.cloud_app_sg.id] # Create a specific SG for this
        associate_public_ip_address = false # Private instance
        tags = {
          Name = "Cloud-App-Server"
        }
      }
      
      # Example Security Group for cloud_app_server
      resource "aws_security_group" "cloud_app_sg" {
        vpc_id = aws_vpc.sdwan_cloud_vpc.id
        name   = "cloud-app-server-sg"
        ingress {
          from_port   = 22
          to_port     = 22
          protocol    = "tcp"
          cidr_blocks = ["10.1.0.0/24"] # Allow SSH from Branch LAN
        }
        ingress {
          from_port   = -1
          to_port     = -1
          protocol    = "icmp"
          cidr_blocks = ["10.1.0.0/24"] # Allow ICMP from Branch LAN
        }
        egress {
          from_port   = 0
          to_port     = 0
          protocol    = "-1"
          cidr_blocks = ["0.0.0.0/0"]
        }
      }
      
  2. From Branch App Server: Ping the Cloud App Server (e.g., ping 10.100.2.10).
  3. From Cloud App Server: Ping the Branch App Server (e.g., ping 10.1.0.10).

Challenge Exercises:

  1. Modify the Ansible playbook to also deploy a specific BGP feature template to the cEdge for a local data center connection (if applicable).
  2. Extend the Python script to create an additional private subnet and configure a Network ACL (NACL) for it.
  3. Implement a pre-check in the Ansible playbook using viptela_device_info to ensure the device is indeed “up” and “reachable” before attempting template attachment.
  4. Add a post-deployment verification step using Python to check the status of the AWS VPN connection and the SD-WAN tunnel using boto3 and cisco.sdwan modules respectively.

9.9 Best Practices Checklist

  • Infrastructure as Code (IaC): Manage all SD-WAN and cloud network configurations as code (Terraform, Ansible playbooks).
  • Version Control: Store all IaC in a Git repository.
  • API Security: Use strong authentication, RBAC, and secure credential management for all API interactions.
  • Least Privilege: Grant automation accounts only the minimum necessary permissions.
  • Modularity & Reusability: Design Ansible playbooks and Terraform modules to be modular, reusable, and easily adaptable across different environments.
  • Idempotency: Ensure automation scripts can be run multiple times without causing unintended side effects.
  • Automated Testing: Implement validation checks (e.g., linting, syntax checks, pre-flight checks) before deployment and post-deployment verification.
  • Telemetry & Monitoring: Leverage SD-WAN and cloud monitoring capabilities to collect performance metrics and ensure network health. Automate alerts for critical events.
  • Change Management: Integrate automation into a structured change management process.
  • Documentation: Maintain clear documentation for all automation scripts, templates, and deployment processes.
  • Network Segmentation: Implement logical segmentation using VPNs/VRFs in SD-WAN and VPCs/VNETs/Security Groups in the cloud.
  • Regular Audits: Periodically audit configurations and automation scripts for security vulnerabilities and compliance.

SD-WAN & Automation:

Cloud Networking & IaC:

Diagramming Tools:

9.11 What’s Next

This chapter equipped you with the skills to automate the complex interplay between SD-WAN and cloud networking. You’ve learned to provision cloud infrastructure with Terraform, manage SD-WAN configurations via vManage APIs using Ansible, and build dynamic cloud automations with Python. The ability to treat these critical network domains as code is fundamental to modern NetDevOps practices.

In the next chapter, we will expand on these concepts by exploring Chapter 10: Advanced Network Telemetry and Analytics for NetDevOps. We will delve into collecting, analyzing, and acting upon network data using tools like streaming telemetry (gRPC, YANG-Push), ELK stack, and custom Python scripts to provide deeper insights and proactive network management.