Introduction
Virtual Local Area Networks (VLANs) are fundamental to modern network design, enabling logical segmentation, enhanced security, and efficient resource utilization. However, their very nature – adding a layer of abstraction – can introduce complexity, making troubleshooting a critical skill for any network engineer. Misconfigured or malfunctioning VLANs can lead to a myriad of issues, from complete network outages to intermittent connectivity, performance degradation, and security vulnerabilities.
This chapter is designed to equip you with a structured approach to VLAN troubleshooting. We will delve into common pitfalls, explore diagnostic tools and commands across multi-vendor environments, and highlight advanced techniques. By the end of this chapter, you will be able to:
- Understand the common causes of VLAN-related network problems.
- Apply systematic troubleshooting methodologies to isolate and resolve issues efficiently.
- Utilize command-line tools and network monitoring solutions for effective diagnosis.
- Identify and mitigate security risks associated with VLAN misconfigurations.
- Leverage network automation for proactive verification and faster remediation.
Technical Concepts: The Foundations of VLAN Troubleshooting
Effective troubleshooting begins with a solid understanding of the underlying technical concepts. Many VLAN issues stem from a misunderstanding or misconfiguration of how VLANs are defined, propagated, and how traffic is forwarded across them.
13.1 VLAN Tagging and Frame Forwarding (802.1Q)
At the heart of VLANs is the IEEE 802.1Q standard, which defines how VLAN identification is inserted into Ethernet frames. When a frame traverses a trunk link, an 802.1Q tag (a 4-byte header) is added, containing the VLAN ID (VID). This tag allows switches to identify which VLAN the frame belongs to.
Key Points for Troubleshooting:
- Tagging/Untagging: Access ports untag traffic entering the switch and tag traffic leaving it (implicitly for the assigned VLAN). Trunk ports explicitly tag traffic for all allowed VLANs, except for the native VLAN.
- Native VLAN: Frames belonging to the native VLAN are untagged on a trunk link. A mismatch in native VLAN configuration between two connected trunk ports is a classic cause of connectivity issues.
- Allowed VLANs: Trunk ports must explicitly allow specific VLANs to traverse them. If a VLAN is permitted on one side but not the other, traffic for that VLAN will be dropped.
Let’s visualize the 802.1Q tag structure:
packetdiag {
colwidth = 32
0-47: Destination MAC
48-95: Source MAC
96-111: EtherType (0x8100 for 802.1Q)
112-114: Priority Code Point (PCP) (3 bits)
115-115: Drop Eligible Indicator (DEI) (1 bit)
116-127: VLAN ID (VID) (12 bits)
128-143: Length/Type (Original EtherType)
144-X: Payload
X-Y: Frame Check Sequence (FCS)
}
Figure 13.1: IEEE 802.1Q Tag Structure within an Ethernet Frame
13.2 Control Plane Protocols (VTP, DTP, STP)
VLANs rely on several control plane protocols that, when misconfigured, can lead to significant troubleshooting challenges.
13.2.1 VLAN Trunking Protocol (VTP) / GARP VLAN Registration Protocol (GVRP)
VTP (Cisco proprietary) and GVRP (standardized) are protocols used to manage VLAN definitions across a switched network. They can automatically propagate VLAN information, reducing manual configuration.
Troubleshooting Concerns:
- VTP Domain/Password Mismatch: Switches in different VTP domains or with incorrect passwords will not share VLAN information.
- VTP Mode: A switch in “client” or “server” mode without connectivity to a server can lose its VLAN database. “Transparent” mode is generally recommended for stability and to prevent unwanted VLAN changes.
- VTP Pruning: If misconfigured, VTP pruning can prevent VLAN traffic from traversing trunks where it’s actually needed.
Let’s illustrate a VTP domain mismatch scenario:
nwdiag {
network "VLANs" {
address = "192.168.10.0/24"
color = "#CCFFCC"
router "Router" { address = "192.168.10.1"; }
}
network "Trunk Link" {
address = "Layer 2 Trunk"
color = "#CCCCFF"
switch "Switch A" {
address = "VTP Domain: DOMAIN_A"
label = "Switch A (VTP Server)"
color = "#FFDDDD"
}
switch "Switch B" {
address = "VTP Domain: DOMAIN_B"
label = "Switch B (VTP Client)"
color = "#DDDDFF"
}
}
"Switch A" -- "Switch B" [label = "Trunk (802.1Q)"];
"Switch A" -- "VLANs";
"Switch B" -- "VLANs";
}
Figure 13.2: VTP Domain Mismatch Leading to VLAN Synchronization Issues
13.2.2 Dynamic Trunking Protocol (DTP)
DTP (Cisco proprietary) automates the negotiation of trunk links. While convenient, it can introduce security risks and unexpected behavior.
Troubleshooting Concerns:
- Unintended Trunks: DTP can establish trunks on ports where they are not desired, potentially exposing VLANs or creating security holes (VLAN hopping).
- Trunk Negotiation Failure: Mismatched DTP modes (e.g., one side
desirableand the otheraccess) can prevent a trunk from forming.
A diagram showing DTP negotiation issues:
digraph DTP_Trouble {
rankdir=LR;
node [shape=box, style="rounded,filled", fillcolor="#F0F4FF", fontname="Arial"];
edge [color="#555555", arrowsize=0.8];
subgraph cluster_switchA {
label = "Switch A"
color = blue;
style=dashed;
portA [label="Gi1/1\n(dynamic desirable)"];
}
subgraph cluster_switchB {
label = "Switch B"
color = red;
style=dashed;
portB [label="Gi1/1\n(access)"];
}
portA -> portB [label="DTP Hello"];
portB -> portA [label="DTP Reject/Ignore"];
// Add a result node
result [label="Trunk Not Formed", shape=oval, fillcolor="#FFDDDD"];
portA -> result [style=dotted];
portB -> result [style=dotted];
}
Figure 13.3: DTP Negotiation Failure Due to Mismatched Port Modes
13.2.3 Spanning Tree Protocol (STP) and VLANs (PVST+, MST)
STP (or its variants like PVST+, Rapid PVST+, MST) is crucial for preventing Layer 2 loops. VLANs introduce per-VLAN STP instances (PVST+) or multiple instances (MST).
Troubleshooting Concerns:
- VLAN Mismatch Issues: STP depends on VLAN information. If VLANs are not consistent across trunks, STP can behave unpredictably, leading to loops or blocked legitimate paths.
- Native VLAN Mismatch: As discussed, this can cause STP BPDUs (which are usually sent untagged) to be misinterpreted, potentially creating loops.
- Root Bridge Placement: Incorrect root bridge placement for specific VLANs can lead to suboptimal traffic paths or unexpected port blocking.
13.3 Layer 3 Inter-VLAN Routing
VLANs provide Layer 2 segmentation. For devices in different VLANs to communicate, Layer 3 routing is required. This is typically achieved using a router-on-a-stick (RoaS) or a Layer 3 switch (SVI - Switched Virtual Interface).
Troubleshooting Concerns:
- Incorrect Subnet/Gateway: Devices in a VLAN must have the correct IP address, subnet mask, and default gateway pointing to the SVI or router sub-interface for that VLAN.
- SVI/Sub-interface Status: The SVI or sub-interface must be
up/upand correctly configured with the IP address for its respective VLAN. - ACLs/Firewall Rules: Inter-VLAN traffic may be blocked by Access Control Lists (ACLs) applied to SVIs or router interfaces.
- Routing Table: The Layer 3 device must have appropriate routes (connected, static, dynamic) to forward traffic between VLANs and out of the network.
A conceptual diagram of Layer 3 inter-VLAN routing:
@startuml
!theme mars
' Define all elements first
cloud "Internet" as INET
rectangle "Core Switch / L3 Device" as L3_SW
rectangle "Access Switch A" as ASW_A
rectangle "Access Switch B" as ASW_B
node "Server VLAN 10" as SERVER_VLAN
node "Client VLAN 20" as CLIENT_VLAN
node "Management VLAN 99" as MGMT_VLAN
' Define subnets for clarity
L3_SW -- "VLAN 10 SVI (10.0.10.1/24)" as SVI10
L3_SW -- "VLAN 20 SVI (10.0.20.1/24)" as SVI20
L3_SW -- "VLAN 99 SVI (10.0.99.1/24)" as SVI99
' Then connect them
INET -up-> L3_SW
L3_SW -left-> ASW_A : Trunk
L3_SW -right-> ASW_B : Trunk
ASW_A -up-> SERVER_VLAN : Access Port VLAN 10
ASW_A -up-> MGMT_VLAN : Access Port VLAN 99
ASW_B -up-> CLIENT_VLAN : Access Port VLAN 20
SVI10 -up- L3_SW
SVI20 -up- L3_SW
SVI99 -up- L3_SW
SERVER_VLAN .right.> SVI10
CLIENT_VLAN .left.> SVI20
MGMT_VLAN .left.> SVI99
@enduml
Figure 13.4: Inter-VLAN Routing Architecture
Configuration Examples: Common Troubleshooting Points
When troubleshooting, examining configurations is paramount. Here, we’ll look at critical VLAN configurations on common platforms and how to verify them.
13.4 Cisco IOS-XE/NX-OS Configuration Verification
13.4.1 VLAN Definitions
Verify that VLANs are correctly defined and have appropriate names.
! On a Cisco Catalyst switch
show vlan brief
! Expected Output (Example):
VLAN Name Status Ports
---- -------------------------------- --------- -------------------------------
1 default active Gi0/1, Gi0/2, Gi0/3, Gi0/4
10 ENGINEERING active
20 SALES active
99 MANAGEMENT active
1002 fddi-default act/unsup
1003 token-ring-default act/unsup
1004 fddinet-default act/unsup
1005 trnet-default act/unsup
Verification: VLANs 10, 20, 99 exist and are active. If a VLAN is missing or inactive, devices in that VLAN will not be able to communicate.
13.4.2 Access Port Configuration
Verify that end-device ports are assigned to the correct VLAN.
! On a Cisco Catalyst switch
show running-config interface GigabitEthernet0/5
! Expected Output (Example - Port for VLAN 10):
interface GigabitEthernet0/5
switchport mode access
switchport access vlan 10
spanning-tree portfast
end
Verification: Port Gi0/5 is in access mode and assigned to VLAN 10. If switchport access vlan is incorrect, the device will be in the wrong VLAN.
13.4.3 Trunk Port Configuration
Verify trunk ports are configured correctly, especially mode, native vlan, and allowed vlan lists.
! On a Cisco Catalyst switch
show running-config interface GigabitEthernet0/24
! Expected Output (Example - Trunk port, native VLAN 99, allowing 10,20,99):
interface GigabitEthernetEthernet0/24
switchport trunk encapsulation dot1q
switchport mode trunk
switchport trunk native vlan 99
switchport trunk allowed vlan 10,20,99
end
! Verification using show interfaces trunk:
show interfaces GigabitEthernet0/24 trunk
! Expected Output (Example):
Port Mode Encapsulation Status Native VLAN
Gi0/24 on 802.1q trunking 99
Port Vlans allowed on trunk
Gi0/24 10,20,99
Port Vlans in spanning tree forwarding state and not pruned
Gi0/24 10,20,99
Verification: This confirms the trunk mode, native VLAN, and allowed VLANs. Crucial troubleshooting steps involve ensuring these match on both ends of the trunk link.
13.4.4 Layer 3 SVI (Inter-VLAN Routing)
Verify the Switched Virtual Interface (SVI) for the VLAN exists and is correctly configured.
! On a Cisco Catalyst Layer 3 switch
show running-config interface Vlan10
! Expected Output (Example - SVI for VLAN 10):
interface Vlan10
ip address 10.0.10.1 255.255.255.0
no shutdown
end
! Verification:
show ip interface brief Vlan10
! Expected Output (Example):
Interface IP-Address OK? Method Status Protocol
Vlan10 10.0.10.1 YES manual up up
Verification: The SVI should be up/up and have the correct IP address. If the status is down/down, it means there are no active physical ports assigned to VLAN 10, or the VLAN itself is not active.
13.5 Juniper JunOS Configuration Verification
13.5.1 VLAN Definitions
Verify VLANs are defined under the bridge-domains or vlans stanza.
# On a Juniper EX/QFX switch
show configuration vlans
# Expected Output (Example):
vlans {
ENGINEERING {
vlan-id 10;
l3-interface irb.10;
}
SALES {
vlan-id 20;
l3-interface irb.20;
}
MANAGEMENT {
vlan-id 99;
l3-interface irb.99;
}
}
Verification: VLANs 10, 20, 99 exist. Missing VLANs will prevent connectivity.
13.5.2 Access Port Configuration
Verify interface is in access mode and assigned to the correct VLAN.
# On a Juniper EX/QFX switch
show configuration interfaces ge-0/0/5
# Expected Output (Example - Port for VLAN 10):
ge-0/0/5 {
unit 0 {
family ethernet-switching {
port-mode access;
vlan {
members ENGINEERING; # Or vlan-id 10
}
}
}
}
Verification: Port ge-0/0/5 is access mode and a member of VLAN ENGINEERING. Incorrect membership leads to devices being in the wrong VLAN.
13.5.3 Trunk Port Configuration
Verify trunk ports are configured correctly, including mode, native-vlan-id, and vlan members.
# On a Juniper EX/QFX switch
show configuration interfaces xe-0/0/0
# Expected Output (Example - Trunk port, native VLAN 99, allowing 10,20,99):
xe-0/0/0 {
unit 0 {
family ethernet-switching {
port-mode trunk;
native-vlan-id 99;
vlan {
members [ ENGINEERING SALES MANAGEMENT ];
}
}
}
}
# Verification:
show ethernet-switching interfaces xe-0/0/0 detail
# Expected Output (Example):
Interface: xe-0/0/0, Enabled, Physical link is Up
Link-type: Trunk, Tagging: Enabled
Native VLAN ID: 99
VLAN members:
VLAN name Tag Tagging Blocking
ENGINEERING 10 untagged No
SALES 20 tagged No
MANAGEMENT 99 untagged No
Verification: Confirms trunk mode, native VLAN, and allowed VLANs. Native VLAN ID being untagged and other members tagged is key.
13.5.4 IRB (Integrated Routing and Bridging) Interface (Inter-VLAN Routing)
Verify IRB interfaces for inter-VLAN routing are configured and active.
# On a Juniper EX/QFX Layer 3 switch
show configuration interfaces irb
# Expected Output (Example - IRB for VLAN 10):
irb {
unit 10 {
family inet {
address 10.0.10.1/24;
}
}
}
# Verification:
show interfaces irb.10
# Expected Output (Example):
Physical interface: irb, Enabled, Physical link is Up
Logical interface irb.10 (Index 69) (SNMP ifIndex 525)
Flags: Up SNMP-Traps 0x40004000 Encapsulation: ENET2
IPv4-Header: 0x40000000
inet ...
Local: 10.0.10.1/24
Verification: The IRB interface should be Up with the correct IP address. Its up state is dependent on the associated VLAN (e.g., VLAN 10) having at least one active port member.
Automation Examples: Proactive Troubleshooting and Verification
Network automation tools can significantly reduce the time spent on troubleshooting by enabling rapid, consistent verification and configuration collection. Identifying misconfigurations quickly is key to minimizing downtime.
13.6 Python (Netmiko/NAPALM) for VLAN Verification
A Python script can connect to multiple devices, retrieve their VLAN and trunk configurations, and compare them against a baseline or expected state.
# python_vlan_check.py
import json
from netmiko import ConnectHandler
from concurrent.futures import ThreadPoolExecutor
# Device inventory (replace with your actual devices)
devices = [
{
"device_type": "cisco_ios",
"host": "192.168.1.10",
"username": "admin",
"password": "cisco",
"secret": "cisco", # Enable password if needed
},
{
"device_type": "juniper_junos",
"host": "192.168.1.11",
"username": "admin",
"password": "juniper",
},
# Add more devices as needed
]
def get_vlan_info(device):
"""Connects to a device and retrieves VLAN and trunk information."""
host = device["host"]
print(f"Connecting to {host}...")
try:
with ConnectHandler(**device) as net_connect:
if "cisco" in device["device_type"]:
vlan_brief = net_connect.send_command("show vlan brief", use_textfsm=True)
trunk_info = net_connect.send_command("show interfaces trunk", use_textfsm=True)
# Optionally get interface configs for access ports
# net_connect.send_command("show running-config | section interface GigabitEthernet", use_textfsm=True)
elif "juniper" in device["device_type"]:
vlan_brief = net_connect.send_command("show vlans | display json", use_textfsm=False)
trunk_info = net_connect.send_command("show ethernet-switching interfaces detail | match 'Link-type: Trunk'", use_textfsm=False)
# Further parsing would be needed for Juniper, e.g., to get members from show vlans
else:
vlan_brief = None
trunk_info = None
return {
"host": host,
"vlan_brief": vlan_brief,
"trunk_info": trunk_info,
}
except Exception as e:
print(f"Error connecting to {host}: {e}")
return {"host": host, "error": str(e)}
if __name__ == "__main__":
vlan_data = []
with ThreadPoolExecutor(max_workers=5) as executor:
results = executor.map(get_vlan_info, devices)
for result in results:
vlan_data.append(result)
print("\n--- Collected VLAN Data ---")
for data in vlan_data:
print(f"\nHost: {data['host']}")
if "error" in data:
print(f" Error: {data['error']}")
else:
print(" VLAN Brief:")
# For Cisco, vlan_brief is a list of dicts from TextFSM
if isinstance(data['vlan_brief'], list):
for vlan in data['vlan_brief']:
print(f" VLAN ID: {vlan.get('vlan_id')}, Name: {vlan.get('name')}, Status: {vlan.get('status')}")
else: # For Juniper, it's raw JSON or text
print(data['vlan_brief'])
print(" Trunk Info:")
# For Cisco, trunk_info is a list of dicts from TextFSM
if isinstance(data['trunk_info'], list):
for trunk in data['trunk_info']:
print(f" Port: {trunk.get('port')}, Native VLAN: {trunk.get('native_vlan')}, Allowed: {trunk.get('vlans_allowed')}")
else: # For Juniper, it's raw text
print(data['trunk_info'])
# Further logic to compare configurations, check for mismatches, etc.
# For example, iterate through vlan_data and look for inconsistencies in native VLANs or allowed VLANs.
Automation: Python script using Netmiko to gather VLAN and trunk information from multi-vendor devices for verification.
13.7 Ansible for Desired State Verification
Ansible playbooks can be used to ensure that VLAN configurations across multiple switches adhere to a desired state. This is a powerful proactive troubleshooting method.
# ansible_vlan_check.yml
---
- name: Verify VLAN and Trunk Configurations
hosts: network_devices # Define this group in your inventory.ini
gather_facts: no
connection: network_cli
vars:
expected_vlans:
- id: 10
name: ENGINEERING
- id: 20
name: SALES
- id: 99
name: MANAGEMENT
expected_trunk_config: # Example for a specific interface, e.g., GigabitEthernet0/24
Gi0/24:
native_vlan: 99
allowed_vlans: "10,20,99"
tasks:
- name: Get VLANs from Cisco devices
cisco.ios.ios_command:
commands: "show vlan brief"
register: cisco_vlan_output
when: ansible_network_os == 'ios' or ansible_network_os == 'iosxr' or ansible_network_os == 'nxos'
- name: Get Trunk interfaces from Cisco devices
cisco.ios.ios_command:
commands: "show interfaces trunk"
register: cisco_trunk_output
when: ansible_network_os == 'ios' or ansible_network_os == 'iosxr' or ansible_network_os == 'nxos'
- name: Get VLANs from Juniper devices
community.junos.junos_rpc:
rpc: get-vlan-information
register: juniper_vlan_output
when: ansible_network_os == 'junos'
- name: Get Ethernet Switching interfaces from Juniper devices
community.junos.junos_rpc:
rpc: get-ethernet-switching-interface-information
register: juniper_trunk_output
when: ansible_network_os == 'junos'
- name: Report Cisco VLAN discrepancies
debug:
msg: "Cisco VLAN discrepancy detected on : "
loop: ""
when:
- ansible_network_os == 'ios'
- item.VLAN_ID not in expected_vlans | map(attribute='id') | map('string')
# This parsing and comparison is simplistic; real-world requires textfsm or more robust parsing
# For full parsing, use 'ansible.network.cli_parse_output' or specific module like 'cisco.ios.ios_vlans'
- name: Report Cisco Trunk discrepancies (example for Gi0/24)
debug:
msg: "Cisco Trunk discrepancy on for Gi0/24: Native VLAN mismatch"
when:
- ansible_network_os == 'ios'
- cisco_trunk_output.stdout[0] is search('Gi0/24.*Native VLAN\\s+(?!' + expected_trunk_config['Gi0/24'].native_vlan | string + ')')
- name: Report Juniper VLAN discrepancies
debug:
msg: "Juniper VLAN discrepancy detected on : VLAN with name "
loop: ""
when:
- ansible_network_os == 'junos'
- item.vlan_id[0]['data'] not in expected_vlans | map(attribute='id')
# Similar logic for name or other attributes
Automation: Ansible playbook to collect VLAN and trunk information from Cisco and Juniper devices, and provide a basic comparison against expected states. More sophisticated parsing and comparison logic would be needed for production use.
Security Considerations: Preventing VLAN-Based Attacks
VLANs are not inherently secure; misconfigurations can create significant vulnerabilities. Troubleshooting often involves identifying and patching these security gaps.
@startuml
!theme mars
' Define elements
cloud "Attacker" as ATTACKER
node "Rogue Device" as ROGUE_DEV
node "Switch (Untrusted Port)" as SW_UNTRUSTED
node "Switch (Trunk Port)" as SW_TRUNK
node "Sensitive Server VLAN (VLAN 10)" as SERVER_VLAN
node "Management VLAN (VLAN 99)" as MGMT_VLAN
' Connect elements
ATTACKER --> ROGUE_DEV : Connect
ROGUE_DEV -[bold]-> SW_UNTRUSTED : VLAN Hopping Attack
SW_UNTRUSTED -[bold]-> SW_TRUNK : Trunk Link
SW_TRUNK [label="> SERVER_VLAN : Access (VLAN 10)
SW_TRUNK"] MGMT_VLAN : Access (VLAN 99)
' Indicate attack path
ROGUE_DEV ..> SERVER_VLAN : Unauthorized Access (VLAN Hopping)
ROGUE_DEV ..> MGMT_VLAN : Unauthorized Access (DTP Spoofing)
note on ROGUE_DEV
Performs DTP Spoofing or Double-Tagging
to gain access to other VLANs.
end note
@enduml
Figure 13.5: Illustrating VLAN Hopping and DTP Spoofing Attack Vectors
13.8 Common Attack Vectors and Mitigation
13.8.1 VLAN Hopping (Switch Spoofing / DTP Spoofing)
- Attack: An attacker’s device spoofs DTP messages to trick an access port into becoming a trunk port, thereby gaining access to all VLANs traversing that trunk.
- Mitigation:
- Disable DTP: Manually configure all non-trunking ports as
switchport mode accessand all trunk ports asswitchport mode trunk. - Hardcode Trunks: Do not use
switchport mode dynamic autoordynamic desirableon production ports. - Disable Unused Ports: Shut down unused switch ports.
- Disable DTP: Manually configure all non-trunking ports as
13.8.2 VLAN Hopping (Double-Tagging / 802.1Q in 802.1Q - 802.1ad)
- Attack: An attacker sends a frame with two 802.1Q tags. The first switch removes the outer tag, forwards the frame, and the second switch sees the inner, malicious tag, delivering the frame to an unintended VLAN.
- Mitigation:
- Native VLAN to Unused VLAN: Configure the native VLAN on all trunk links to an unused VLAN (e.g., VLAN 999). Never use VLAN 1 as the native VLAN. This prevents untagged traffic from being implicitly associated with a sensitive VLAN.
- Ingress Filtering: Some advanced switches can perform ingress filtering on trunk ports to detect and drop double-tagged frames.
13.8.3 Private VLANs (PVLANs)
- Mitigation: PVLANs further segment a VLAN at Layer 2, preventing communication between devices within the same “primary” VLAN unless explicitly allowed. This is effective for server farms or public access areas where clients should not talk to each other but need access to a common gateway.
13.8.4 Other Best Practices
- Port Security: Limit the number of MAC addresses on access ports.
- Control Plane Policing (CoPP): Protect the switch’s CPU from malicious control plane traffic, including DTP.
- ACLs: Implement strict ACLs on SVIs to control inter-VLAN communication.
- Authentication (802.1X): Authenticate devices connecting to switch ports to ensure only authorized devices can access the network.
Security Configuration Example (Cisco IOS-XE):
! Hardcode trunk mode and disable DTP
interface GigabitEthernet0/24
switchport trunk encapsulation dot1q
switchport mode trunk
switchport nonegotiate
switchport trunk native vlan 999 ! Use an unused VLAN for native
switchport trunk allowed vlan 10,20,99
!
! Secure an access port
interface GigabitEthernet0/5
switchport mode access
switchport access vlan 10
switchport port-security ! Enable port security
switchport port-security maximum 1 ! Allow only one MAC address
switchport port-security violation restrict ! Drop packets and generate syslog
spanning-tree portfast
!
! Shut down unused ports
interface GigabitEthernet0/20
shutdown
Security: Hardening VLAN configurations to prevent common attacks.
Verification & Troubleshooting: A Systematic Approach
Troubleshooting VLANs requires a systematic approach, often starting at the physical layer and moving up the OSI model.
13.9 Troubleshooting Methodologies
- Define the Problem: What exactly is not working? (e.g., “Users in VLAN 20 cannot reach the server in VLAN 10,” or “New laptop can’t get an IP address”).
- Gather Information: Collect error messages,
showcommand outputs, logs, and user reports. - Top-Down/Bottom-Up/Divide-and-Conquer:
- Bottom-Up (Layer 1 -> Layer 7): Start with physical connectivity (cables, lights), then Layer 2 (VLANs, trunks, MAC addresses), then Layer 3 (IP addresses, routing).
- Top-Down (Layer 7 -> Layer 1): Start with the application, then network services (DNS, DHCP), Layer 3, Layer 2, Layer 1.
- Divide-and-Conquer: Isolate the problem domain (e.g., “Is it just this VLAN?” “Is it just this switch?” “Is it between VLANs or within a VLAN?”).
- Formulate Hypothesis: Based on information, hypothesize the cause (e.g., “I suspect a native VLAN mismatch”).
- Test Hypothesis: Use specific commands, tools, or tests to confirm or deny the hypothesis.
- Resolve and Verify: Implement the fix and confirm that the problem is resolved.
- Document: Record the problem, resolution, and any lessons learned.
13.10 Common VLAN Issues and Resolutions
| Issue Category | Common Symptoms | Verification Commands (Cisco/Juniper) | Resolution Steps |
|---|---|---|---|
| Physical/Link Layer | No connectivity, link down | show interface status/show interfaces description | Check cabling, SFP, port status, speed/duplex. |
| Access Port Misconfig | Device no IP, cannot reach gateway in expected VLAN | show interface <intf> switchport/show ethernet-switching interfaces <intf> | Ensure switchport mode access and switchport access vlan <VLAN_ID> (Cisco) or port-mode access and vlan members <VLAN_NAME> (Juniper). |
| Trunk Port Misconfig | Inter-VLAN traffic fails, specific VLANs blocked | show interfaces trunk/show ethernet-switching interfaces <intf> detail | Verify switchport mode trunk, native vlan, allowed vlan lists (Cisco) or port-mode trunk, native-vlan-id, vlan members (Juniper) match on both ends. |
| Native VLAN Mismatch | Untagged traffic (including BPDUs) misrouted, STP loops | show interfaces trunk (Cisco) / show ethernet-switching interfaces <intf> detail (Juniper) | Ensure native vlan (Cisco) or native-vlan-id (Juniper) is identical on both sides of the trunk. Use an unused VLAN. |
| VTP/VLAN Database Sync | VLANs missing, devices cannot communicate | show vtp status/show vlan brief (Cisco) / show vlans (Juniper) | Verify VTP domain, password, mode. In transparent mode, configure VLANs manually. |
| STP Issues (VLAN context) | Network loops, blocked valid paths | show spanning-tree vlan <VLAN_ID> | Check root bridge placement, port roles/states for each VLAN. Investigate native VLAN mismatch if STP issues are widespread. |
| Layer 3 SVI/IRB Issues | Inter-VLAN communication fails | show ip interface brief Vlan<VLAN_ID> (Cisco) / show interfaces irb.<VLAN_ID> (Juniper) | Ensure SVI/IRB is up/up, correct IP, no ACL blocking. Verify device default gateway. |
| MAC Address Table Issues | Unicast flooding, intermittent connectivity | show mac address-table/show ethernet-switching table | Clear MAC address table (clear mac address-table dynamic interface <intf>). Investigate source of excessive MAC learning. |
| DHCP Issues | Devices fail to get IP addresses | show ip dhcp snooping binding/debug ip dhcp server (Cisco) | Verify DHCP server reachability, ip helper-address (Cisco) or DHCP relay on SVI/IRB. Check DHCP snooping. |
13.11 Diagnostic Tools and Commands
- Ping/Traceroute: Fundamental for verifying Layer 3 connectivity and identifying where traffic stops.
show interface <interface>: Checks physical status, errors, duplex, speed.show vlan brief(Cisco) /show vlans(Juniper): Verifies VLAN existence and status.show interfaces <interface> switchport(Cisco) /show ethernet-switching interfaces <interface> detail(Juniper): Shows port mode (access/trunk), assigned/native VLAN, allowed VLANs.show interfaces trunk(Cisco): Specific to Cisco, shows all trunk ports and their configuration.show mac address-table(Cisco) /show ethernet-switching table(Juniper): Verifies if MAC addresses are learned on the correct ports/VLANs.show ip interface brief(Cisco) /show interfaces terse | match irb(Juniper): Checks SVI/IRB status and IP addresses.show ip route(Cisco) /show route(Juniper): Verifies Layer 3 routing table.debugcommands (Cisco): (Use with caution in production) e.g.,debug vlan packet,debug dtp events,debug ip dhcp server packet.- Packet Sniffers (Wireshark, tcpdump): Invaluable for capturing traffic on a mirror port to analyze 802.1Q tags, IP headers, and application-layer issues.
Let’s look at a common troubleshooting scenario with packetdiag:
packetdiag {
colwidth = 32
0-47: Destination MAC (Server)
48-95: Source MAC (Client)
96-111: EtherType (0x8100 - 802.1Q Tagged)
112-114: PCP (0)
115-115: DEI (0)
116-127: VLAN ID (10 - Client VLAN)
128-143: Length/Type (0x0800 - IPv4)
144-175: Source IP (Client)
176-207: Destination IP (Server)
208-X: TCP/UDP Header
X-Y: Application Data
}
Figure 13.6: Expected Packet Structure for Tagged Traffic on a Trunk Link (Client in VLAN 10)
If this packet arrives at a switch whose trunk port configuration for the peer device does not include VLAN 10 in its allowed vlans list, the packet will be dropped by the switch after the tag is inspected. If the native VLAN on the other side of the trunk is also 10, but the frame is tagged, this can also lead to misinterpretation depending on the implementation.
13.12 Root Cause Analysis
After fixing an issue, always perform a root cause analysis:
- Why did this configuration error occur? (Manual error, automation bug, lack of process?)
- How can we prevent it from happening again? (Better review, automation, documentation, training?)
- Are there other systems or configurations that might be similarly affected?
Performance Optimization for VLANs
While VLANs improve network efficiency, improper design or configuration can hinder performance.
13.13 Tuning Parameters and Best Practices
- VLAN Pruning: Enable VLAN pruning (e.g., Cisco VTP pruning) to prevent unnecessary VLAN traffic from traversing trunk links where those VLANs have no active ports. This reduces broadcast domains on specific links, saving bandwidth and switch CPU cycles.
- VTP Transparent Mode: For large, stable networks, consider running VTP in transparent mode on all switches. This prevents accidental VLAN database updates and offers greater control, though it requires manual VLAN creation on each switch.
- Layer 3 Switching vs. Router-on-a-Stick: For high inter-VLAN traffic, Layer 3 switches (using SVIs) offer better performance than traditional router-on-a-stick configurations due to hardware-based routing.
- Optimized SVI/IRB Placement: Place inter-VLAN routing interfaces (SVIs/IRBs) as close to the traffic source as possible to reduce latency and utilize local routing capacity.
- Jumbo Frames: If applications require it, enable jumbo frames end-to-end across all VLANs and trunks to reduce CPU overhead and increase throughput for large data transfers. Ensure all devices in the path support and are configured for the larger MTU.
- EtherChannel/LAG: Bundle multiple physical links into a single logical trunk (EtherChannel/LAG) for increased bandwidth and redundancy between switches.
13.14 Monitoring Recommendations
- Interface Statistics: Monitor errors, discards, and utilization on all access and trunk ports. High discards on trunk ports often indicate an allowed VLAN mismatch or congestion.
- VLAN Utilization: Track broadcast traffic levels per VLAN to identify potential runaway broadcasts (e.g., from a faulty NIC).
- STP State: Monitor STP port states and root bridge stability to quickly detect topology changes or loops.
- SVI/IRB Status: Ensure inter-VLAN routing interfaces remain
up/up.
Hands-On Lab: Resolving a Native VLAN Mismatch
This lab will guide you through diagnosing and fixing a common VLAN issue: a native VLAN mismatch on a trunk link.
nwdiag {
network "Uplink to Core" {
address = "Layer 2 Trunk"
color = "#CCCCFF"
switch "Core_SW1" {
label = "Core_SW1 (Gi0/1)"
address = "Native VLAN: 1" // Mismatch for lab
color = "#FFDDDD"
}
switch "Access_SW2" {
label = "Access_SW2 (Gi0/1)"
address = "Native VLAN: 99" // Correct native VLAN
color = "#DDDDFF"
}
}
network "VLAN 10 Users" {
address = "192.168.10.0/24"
color = "#CCFFCC"
host "PC1" { address = "192.168.10.10"; }
}
network "VLAN 20 Servers" {
address = "192.168.20.0/24"
color = "#FFEEDD"
host "Server1" { address = "192.168.20.10"; }
}
"Core_SW1" -- "Uplink to Core" [label = "Gi0/1"];
"Access_SW2" -- "Uplink to Core" [label = "Gi0/1"];
"Access_SW2" -- "VLAN 10 Users" [label = "Access Port Gi0/2 (VLAN 10)"];
"Core_SW1" -- "VLAN 20 Servers" [label = "Access Port Gi0/2 (VLAN 20)"];
"PC1" -- "VLAN 10 Users";
"Server1" -- "VLAN 20 Servers";
}
Figure 13.7: Lab Topology - Native VLAN Mismatch Scenario
Lab Objectives:
- Identify the native VLAN mismatch between
Core_SW1andAccess_SW2. - Understand the impact of the mismatch (e.g., STP issues, management access issues).
- Correct the native VLAN configuration on
Core_SW1. - Verify full connectivity.
Step-by-Step Configuration (Initial State - Cisco IOS-XE):
Core_SW1 (Initial, problematic config for Gi0/1):
interface GigabitEthernet0/1
switchport trunk encapsulation dot1q
switchport mode trunk
switchport trunk allowed vlan 10,20,99
! Default native VLAN 1 is active
end
vlan 10
name VLAN10_Users
vlan 20
name VLAN20_Servers
vlan 99
name Management
end
Access_SW2 (Correct config for Gi0/1):
interface GigabitEthernet0/1
switchport trunk encapsulation dot1q
switchport mode trunk
switchport trunk native vlan 99
switchport trunk allowed vlan 10,20,99
end
vlan 10
name VLAN10_Users
vlan 20
name VLAN20_Servers
vlan 99
name Management
end
Verification Steps (Troubleshooting):
- From
PC1(connected toAccess_SW2on VLAN 10), try to pingServer1(connected toCore_SW1on VLAN 20) or theManagementSVI (VLAN 99) onCore_SW1. Expect failure or intermittent issues. - On
Access_SW2, check the trunk status forGigabitEthernet0/1:
Expected output will showshow interfaces GigabitEthernet0/1 trunkNative VLAN: 99but likely a warning about mismatch from peer. - On
Core_SW1, check the trunk status forGigabitEthernet0/1:
Expected output will showshow interfaces GigabitEthernet0/1 trunkNative VLAN: 1and a warning about mismatch from peer. - Check STP on both switches for VLAN 1 and VLAN 99:
You might observe unexpected port states or a change in the root bridge for VLAN 1 or 99, as BPDUs (untagged) are being seen on different native VLANs.show spanning-tree vlan 1 show spanning-tree vlan 99
Resolution Steps:
- On
Core_SW1, configure the native VLAN to matchAccess_SW2(VLAN 99):
You might see a console message indicating a native VLAN mismatch resolution.configure terminal interface GigabitEthernet0/1 switchport trunk native vlan 99 end
Verification Steps (Post-Resolution):
- On both
Core_SW1andAccess_SW2, re-check trunk status:
Both should now showshow interfaces GigabitEthernet0/1 trunkNative VLAN: 99without mismatch warnings. - On both switches, re-check STP for VLAN 1 and VLAN 99. STP should converge normally.
- From
PC1, try to pingServer1and theManagementSVI onCore_SW1again. Expect successful pings.
Challenge Exercises:
- Introduce an “allowed VLAN mismatch” (e.g., remove VLAN 20 from
Core_SW1’s allowed list) and diagnose whyPC1cannot reachServer1. - Disable DTP on both switches manually and verify the trunk remains up.
- Configure port security on the access port for
PC1and test its functionality.
Best Practices Checklist
[ ] Standardize VLAN IDs: Use a consistent numbering scheme across the network.
[ ] Document All VLANs: Maintain up-to-date documentation of VLAN IDs, names, and their purpose.
[ ] Disable DTP: Manually configure all trunk and access ports to prevent unintended trunks.
[ ] Change Native VLAN: Configure the native VLAN on trunks to an unused VLAN ID (not VLAN 1).
[ ] VTP Transparent Mode: For stability, consider using VTP Transparent mode or a centralized VLAN management solution instead of VTP Server/Client modes.
[ ] Implement Port Security: Limit MAC addresses per access port.
[ ] Filter Unused VLANs: Use switchport trunk allowed vlan or vlan members to only permit necessary VLANs on trunks.
[ ] Shut Down Unused Ports: Disable ports that are not in use to reduce attack surface.
[ ] Implement ACLs on SVIs: Control inter-VLAN traffic with access control lists.
[ ] Monitor Trunk Status: Regularly check trunk health for mismatches or errors.
[ ] Automate Verification: Use tools like Ansible or Python to regularly audit VLAN configurations.
[ ] Plan Inter-VLAN Routing: Optimize SVI/IRB placement for performance and security.
[ ] Enable VLAN Pruning: Reduce broadcast traffic on trunk links where VLANs are not needed.
Reference Links
- IEEE 802.1Q (VLAN Tagging): The foundational standard for VLANs. The latest amendment is IEEE 802.1Q-2022.
- IEEE 802.1ad (QinQ / Provider Bridges): Extends 802.1Q to allow multiple VLAN tags, often used in service provider networks.
- Cisco VLAN Best Practices: https://www.cisco.com/c/en/us/support/docs/smb/routers/cisco-rv-series-small-business-routers/1778-tz-VLAN-Best-Practices-and-Security-Tips-for-Cisco-Business-Routers.html
- Juniper VLAN Documentation: Refer to specific JunOS documentation for your device family (EX, QFX).
- VLAN Hopping Attacks & Mitigation: https://www.imperva.com/learn/availability/vlan-hopping/
- Network Automation with Ansible (VLANs): https://medium.com/@mickaelsoares/network-automation-with-ansible-lab4-configure-vlans-and-trunks-f97775cd2d61
- Netmiko Documentation: https://github.com/ktbyers/netmiko
- Wireshark: https://www.wireshark.org/
- PlantUML Documentation (Network Diagrams): https://plantuml.com/nwdiag
- PacketDiag Documentation: http://blockdiag.com/en/nwdiag/packetdiag-examples.html
What’s Next
This chapter provided a robust framework for VLAN troubleshooting, covering methodologies, multi-vendor commands, automation, and security considerations. You’ve learned how to systematically approach complex VLAN issues and prevent them through best practices.
In the next chapter, we will expand our focus to Chapter 14: Advanced VLAN Design for Cloud and Hybrid Environments. We will explore concepts like VXLAN, EVPN, and how VLANs integrate with public cloud providers like AWS and Azure, preparing you for modern, scalable network architectures.