Chapter 13: Internal Linking, Navigation, and Table of Contents Generation

Welcome to Chapter 13! In this pivotal chapter, we’ll significantly enhance the usability and navigability of our static sites by implementing robust features for internal linking, global navigation generation, and automatic Table of Contents (ToC) creation. These features are crucial for any content-rich website, allowing users to easily discover related content, understand the site’s structure, and quickly jump to relevant sections within a page.

By the end of this chapter, our SSG will be capable of:

Resolving internal Markdown links to their correct, generated output URLs.
Building a hierarchical navigation structure from page metadata, which can be rendered in templates (e.g., a sidebar or header menu).
Automatically generating a Table of Contents for individual pages based on their heading structure, complete with anchor links.

This functionality builds heavily upon the content parsing, routing, and templating systems we’ve established in previous chapters. We’ll leverage the Page and Site data structures, extending them to store and process the necessary metadata for these features. The goal is to make content interconnected and easily navigable, which is a hallmark of a well-designed static site.

Planning & Design

Implementing internal linking, navigation, and ToC requires careful coordination across our content processing and rendering pipeline. We need to:

Identify and rewrite internal links during Markdown processing. This means having access to the global URL map of all pages.
Collect navigation metadata (title, URL, weight, menu assignment) from all pages during the content loading phase.
Extract heading information (level, text) during Markdown-to-HTML conversion for ToC generation.

Component Architecture and Data Flow

The following Mermaid diagram illustrates the flow for these new features:

flowchart TD Build_Start[Build Process Start] --> Scan_Content[Scan Content Directory] Scan_Content --> Collect_Raw_Pages[Collect Raw Page Data Frontmatter and Markdown] Collect_Raw_Pages --> Parallel_Page_Processing[Parallel Page Processing] subgraph Page_Processing_Flow["Single Page Processing Flow"] Parallel_Page_Processing --> Parse_Frontmatter[Parse Frontmatter] Parse_Frontmatter --> Parse_Markdown_AST[Parse Markdown to AST pulldown cmark] Parse_Markdown_AST --> Extract_Headings{Extract Headings for ToC} Extract_Headings --> Transform_AST_To_HTML[Transform AST to HTML] Transform_AST_To_HTML --> Identify_Internal_Links{Identify Internal Links} Identify_Internal_Links --> Store_Page_Data[Store Page Data HTML Frontmatter ToC and Raw] end Store_Page_Data --> All_Pages_Processed[All Pages Processed] All_Pages_Processed --> Resolve_All_Links[Resolve All Internal Links] Resolve_All_Links --> Build_Navigation_Tree[Build Global Navigation Tree] Build_Navigation_Tree --> Render_Pages_With_Context[Render Pages with Tera ToC and Nav Data] Render_Pages_With_Context --> Write_Output[Write Static HTML Files] Write_Output --> Build_End[Build Process End] style Parallel_Page_Processing fill:#f9f,stroke:#333,stroke-width:2px style Store_Page_Data fill:#f9f,stroke:#333,stroke-width:2px style Resolve_All_Links fill:#f9f,stroke:#333,stroke-width:2px style Build_Navigation_Tree fill:#f9f,stroke:#333,stroke-width:2px style Render_Pages_With_Context fill:#f9f,stroke:#333,stroke-width:2px

Key Data Structures:

Page struct: Will be augmented to include:
- toc: Vec<TocEntry> (for Table of Contents)
- raw_links: Vec<String> (original internal link paths found in Markdown, before resolution)
- resolved_html: String (HTML with all internal links rewritten)
Site struct (or BuildContext): Will hold:
- page_url_map: HashMap<PathBuf, String> (maps source content path to final output URL, critical for link resolution).
- navigation_tree: Vec<NavEntry> (the global navigation structure).

File Structure Changes

We’ll primarily be modifying existing files:

src/content.rs: Enhance Page struct.
src/processor.rs: Modify Markdown parsing and HTML rendering logic.
src/site.rs: Add logic to manage page_url_map and build the navigation tree.
src/template_engine.rs: Pass new data to Tera context.
src/main.rs: Orchestrate the new build steps.

Step-by-Step Implementation

1. Enhance `Page` and `Frontmatter` Structures

First, let’s update our Page and Frontmatter structs to accommodate ToC data and navigation-related fields.

src/content.rs

use std::collections::HashMap;
use std::path::PathBuf;
use serde::{Deserialize, Serialize};
use chrono::{DateTime, Utc};

// New struct for Table of Contents entries
#[derive(Debug, Serialize, Clone)]
pub struct TocEntry {
    pub level: u32,
    pub text: String,
    pub id: String, // Anchor ID for the heading
}

// New struct for Navigation entries
#[derive(Debug, Serialize, Clone, PartialEq, Eq, PartialOrd, Ord)]
pub struct NavEntry {
    pub title: String,
    pub url: String,
    pub weight: i32,
    pub children: Vec<NavEntry>,
    #[serde(skip)] // Don't serialize menu name itself
    pub menu_name: Option<String>,
}

impl NavEntry {
    pub fn new(title: String, url: String, weight: i32, menu_name: Option<String>) -> Self {
        NavEntry {
            title,
            url,
            weight,
            children: Vec::new(),
            menu_name,
        }
    }
}

// Existing Frontmatter struct, add new fields for navigation
#[derive(Debug, Deserialize, Serialize, Clone)]
#[serde(default)] // Allows missing fields to use their default values
pub struct Frontmatter {
    pub title: String,
    pub date: Option<DateTime<Utc>>,
    pub draft: bool,
    pub description: Option<String>,
    pub slug: Option<String>,
    pub weight: i32, // For ordering in navigation
    pub keywords: Vec<String>,
    pub tags: Vec<String>,
    pub categories: Vec<String>,
    pub author: String,
    pub show_reading_time: bool,
    pub show_table_of_contents: bool,
    pub show_comments: bool,
    pub toc: bool, // Legacy or explicit TOC control
    pub menu: Vec<String>, // New: Specifies which menus this page belongs to
    pub template: Option<String>, // Optional template override
}

impl Default for Frontmatter {
    fn default() -> Self {
        Frontmatter {
            title: "Untitled".to_string(),
            date: Some(Utc::now()),
            draft: false,
            description: None,
            slug: None,
            weight: 0,
            keywords: Vec::new(),
            tags: Vec::new(),
            categories: Vec::new(),
            author: "AI Expert".to_string(),
            show_reading_time: true,
            show_table_of_contents: true,
            show_comments: false,
            toc: true,
            menu: Vec::new(), // Default to no menus
            template: None,
        }
    }
}

// Existing Page struct, add toc and updated_content fields
#[derive(Debug, Clone)]
pub struct Page {
    pub file_path: PathBuf, // Original path relative to content dir
    pub relative_path: PathBuf, // Relative path from content root, used for slug generation
    pub frontmatter: Frontmatter,
    pub markdown_content: String, // Original markdown
    pub rendered_html: String, // HTML after initial markdown conversion (before link resolution)
    pub final_html: String, // HTML after link resolution and component processing
    pub url_path: String, // Final public URL path (e.g., /posts/my-post/)
    pub toc: Vec<TocEntry>, // Table of Contents for this page
    pub collection: String, // e.g., "posts", "pages", "topics"
}

impl Page {
    pub fn new(
        file_path: PathBuf,
        relative_path: PathBuf,
        frontmatter: Frontmatter,
        markdown_content: String,
        rendered_html: String,
        url_path: String,
        collection: String,
    ) -> Self {
        Page {
            file_path,
            relative_path,
            frontmatter,
            markdown_content,
            rendered_html,
            final_html: String::new(), // Will be populated later
            url_path,
            toc: Vec::new(), // Will be populated during processing
            collection,
        }
    }

    /// Returns the canonical slug for the page, prioritizing frontmatter slug.
    pub fn get_slug(&self) -> String {
        if let Some(slug) = &self.frontmatter.slug {
            slug.clone()
        } else {
            // Derive from file_path, e.g., "my-post.md" -> "my-post"
            self.relative_path
                .file_stem()
                .and_then(|s| s.to_str())
                .unwrap_or("untitled")
                .to_string()
        }
    }
}

Explanation:

We introduced TocEntry and NavEntry structs for better organization.
Frontmatter now includes weight (for sorting navigation items) and menu: Vec<String> (to specify which navigation menus a page belongs to).
Page now has toc: Vec<TocEntry> to store the extracted headings and final_html: String which will hold the HTML after link resolution, distinct from rendered_html which is just the raw Markdown-to-HTML output. rendered_html will be used as input for link resolution.

2. Implement Table of Contents (ToC) Generation

ToC generation involves iterating through the Markdown AST, identifying headings, generating unique IDs, and collecting them. We’ll modify our markdown_to_html function.

src/processor.rs

First, add pulldown_cmark_to_md::slugify to your Cargo.toml for generating clean IDs:

# Cargo.toml
[dependencies]
# ... other dependencies
pulldown-cmark = "0.10"
pulldown-cmark-to-md = "0.4" # Add this for slugify
regex = "1.10" # For link rewriting later
lazy_static = "1.4" # For regex global
log = "0.4"

Now, modify src/processor.rs:

use pulldown_cmark::{Parser, Event, Tag, Options, CowStr};
use pulldown_cmark_to_md::slugify; // Import slugify
use std::collections::HashMap;
use std::path::PathBuf;
use log::{warn, info};
use regex::Regex;
use lazy_static::lazy_static;

use crate::content::{Frontmatter, Page, TocEntry}; // Import TocEntry

/// Parses frontmatter from a string.
pub fn parse_frontmatter(content: &str) -> Result<(Frontmatter, &str), String> {
    // ... (existing parse_frontmatter code) ...
    // (Assuming this function is already implemented from previous chapters)
    // Example placeholder:
    if content.starts_with("+++") {
        if let Some(end_idx) = content[3..].find("+++") {
            let frontmatter_str = &content[3..3 + end_idx];
            let remaining_content = &content[3 + end_idx + 3..];
            match toml::from_str::<Frontmatter>(frontmatter_str) {
                Ok(fm) => return Ok((fm, remaining_content)),
                Err(e) => return Err(format!("Failed to parse frontmatter: {}", e)),
            }
        }
    } else if content.starts_with("---") {
        if let Some(end_idx) = content[3..].find("---") {
            let frontmatter_str = &content[3..3 + end_idx];
            let remaining_content = &content[3 + end_idx + 3..];
            match serde_yaml::from_str::<Frontmatter>(frontmatter_str) {
                Ok(fm) => return Ok((fm, remaining_content)),
                Err(e) => return Err(format!("Failed to parse frontmatter: {}", e)),
            }
        }
    }
    // If no frontmatter, return default and full content
    Ok((Frontmatter::default(), content))
}


/// Converts Markdown content to HTML, extracts ToC entries, and identifies raw internal links.
pub fn markdown_to_html(markdown: &str) -> (String, Vec<TocEntry>) {
    let mut options = Options::empty();
    options.insert(Options::ENABLE_TABLES);
    options.insert(Options::ENABLE_FOOTNOTES);
    options.insert(Options::ENABLE_STRIKETHROUGH);
    options.insert(Options::ENABLE_TASKLISTS);
    options.insert(Options::ENABLE_SMART_PUNCTUATION);

    let parser = Parser::new_ext(markdown, options);

    let mut html_output = String::new();
    let mut toc_entries: Vec<TocEntry> = Vec::new();
    let mut current_heading_text = String::new();
    let mut heading_levels: HashMap<String, u32> = HashMap::new(); // To track duplicate slugs

    // This is a custom event iterator that processes events and collects data
    let mut events: Vec<Event> = Vec::new();

    for event in parser {
        match event {
            Event::Start(Tag::Heading(level, _, _)) => {
                current_heading_text.clear();
                events.push(Event::Start(Tag::Heading(level, None, Vec::new()))); // Push a dummy start tag for now
            }
            Event::Text(text) => {
                current_heading_text.push_str(&text);
                events.push(Event::Text(text));
            }
            Event::End(Tag::Heading(level, _, _)) => {
                let mut id = slugify(&current_heading_text);
                // Handle duplicate IDs
                let count = heading_levels.entry(id.clone()).or_insert(0);
                *count += 1;
                if *count > 1 {
                    id = format!("{}-{}", id, count - 1); // Append -1, -2 etc.
                }

                toc_entries.push(TocEntry {
                    level: level as u32,
                    text: current_heading_text.clone(),
                    id: id.clone(),
                });

                // Replace the dummy start tag with the actual one, including the generated ID
                if let Some(Event::Start(Tag::Heading(_, _, _))) = events.last_mut() {
                    *events.last_mut().unwrap() = Event::Start(Tag::Heading(level, Some(CowStr::from(id)), Vec::new()));
                } else {
                    // This should ideally not happen if logic is correct, but good for debugging
                    warn!("Failed to find matching start tag for heading: {}", current_heading_text);
                }
                events.push(Event::End(Tag::Heading(level, None, Vec::new())));
            }
            _ => events.push(event), // Push all other events as is
        }
    }

    pulldown_cmark::html::push_html(&mut html_output, events.into_iter());

    (html_output, toc_entries)
}

/// Rewrites internal links in the HTML content.
/// It takes a map of source content paths to their final public URLs.
pub fn rewrite_internal_links(html: &str, url_map: &HashMap<PathBuf, String>) -> String {
    // Regex to find href attributes in <a> tags
    lazy_static! {
        static ref LINK_RE: Regex = Regex::new(r#"<a\s+(?:[^>]*?\s+)?href=["']([^"']+)["']"#).unwrap();
    }

    LINK_RE.replace_all(html, |caps: &regex::Captures| {
        let original_href = &caps[1];
        
        // Check if the link looks like an internal content path (e.g., ends with .md or starts with /content/)
        // This is a simplification; a more robust solution might use a custom link syntax.
        // For now, let's assume any link starting with '/' or not having a scheme (http/https)
        // AND matching a known content path should be resolved.
        
        // Let's assume links starting with '/' and ending with '.md' are internal references
        // e.g., `/posts/my-post.md` or `../other-post.md`
        // We'll normalize these to `PathBuf` relative to the content root.
        
        let mut resolved_href = original_href.to_string();

        // Simple check: if it ends with .md, try to resolve it.
        // A more advanced system would use `ref` or `relref` shortcodes.
        if original_href.ends_with(".md") {
            // Attempt to resolve based on content directory structure
            // This needs context of the current page's path. For now, we'll
            // assume all paths in `url_map` are relative to the content root.
            // A perfect solution would need the source_path of the current page being processed.
            
            // For this implementation, we'll assume `original_href` is already
            // a path relative to the content root, e.g., "posts/my-article.md"
            // or starts with "/" and then a content path.
            let mut content_path_buf = PathBuf::from(original_href.trim_start_matches('/'));

            // If the link is just a filename (e.g., "other-page.md"), we might need
            // more context to find it. For now, we assume full relative paths from content root.
            if let Some(url) = url_map.get(&content_path_buf) {
                resolved_href = url.clone();
                info!("Rewrote internal link '{}' to '{}'", original_href, resolved_href);
            } else {
                warn!("Could not resolve internal link: '{}'. File not found in content map.", original_href);
            }
        } else if original_href.starts_with('/') && !original_href.starts_with("//") {
            // This could be an absolute path to another content page or a static asset.
            // For content pages, we need to map `/my-page` to `/my-page/`.
            // Let's assume if it matches an entry in url_map (after stripping trailing slash for comparison), it's a content page.
            let mut potential_path = PathBuf::from(original_href.trim_start_matches('/'));
            if potential_path.extension().is_none() { // If it's a directory-like path, e.g., /posts/my-post
                potential_path = potential_path.join("index.md"); // Assume index.md for content
            }

            if let Some(url) = url_map.get(&potential_path) {
                resolved_href = url.clone();
                info!("Rewrote internal link '{}' to '{}'", original_href, resolved_href);
            } else {
                // It might be a static asset or a link we don't manage, leave as is.
                // Or log if it's expected to be a content link but isn't found.
            }
        }
        
        format!(r#"href="{resolved_href}""#)
    }).to_string()
}

Explanation:

ToC Generation:
- markdown_to_html now returns a Vec<TocEntry> along with the HTML.
- We use pulldown_cmark’s event stream. When Event::Start(Tag::Heading) is encountered, we clear current_heading_text. When Event::Text follows a heading start, we append to current_heading_text.
- When Event::End(Tag::Heading) is hit, we have the full heading text. We slugify it using pulldown_cmark_to_md::slugify to create a URL-friendly ID.
- A heading_levels HashMap is used to handle duplicate heading texts by appending -1, -2, etc., to their IDs.
- We then modify the Event::Start(Tag::Heading) event to include the generated ID as an anchor.
Internal Link Rewriting:
- rewrite_internal_links function uses regex to find href attributes in <a> tags.
- Crucially: This function requires a url_map: &HashMap<PathBuf, String> which maps source content file paths (e.g., posts/my-article.md) to their final output URLs (e.g., /posts/my-article/).
- The if original_href.ends_with(".md") block is a simple heuristic. A more robust solution for internal links would involve:
  - A custom Markdown extension or shortcode syntax (e.g., {{< ref "posts/my-article.md" >}}) which is explicitly parsed.
  - Passing the current page’s source path to rewrite_internal_links to handle relative paths like ../other-page.md correctly.
- For now, we assume original_href is a path relative to the content root (e.g., posts/my-article.md) or an absolute content path (e.g., /posts/my-article).

3. Update `Site` and Build Pipeline for Link Resolution and Navigation

We need to collect all page URLs and then perform a second pass to resolve links and build navigation.

src/site.rs

use std::collections::HashMap;
use std::path::{Path, PathBuf};
use std::fs;
use std::error::Error;
use log::{info, warn, error};
use crate::content::{Page, Frontmatter, NavEntry};
use crate::processor;
use crate::template_engine::TemplateEngine;
use rayon::prelude::*; // For parallel processing

// Configuration structure (from previous chapters)
#[derive(Debug, Clone)]
pub struct SiteConfig {
    pub content_dir: PathBuf,
    pub output_dir: PathBuf,
    pub templates_dir: PathBuf,
    pub static_dir: Option<PathBuf>,
    pub base_url: String,
}

impl Default for SiteConfig {
    fn default() -> Self {
        SiteConfig {
            content_dir: PathBuf::from("content"),
            output_dir: PathBuf::from("public"),
            templates_dir: PathBuf::from("templates"),
            static_dir: Some(PathBuf::from("static")),
            base_url: "/".to_string(),
        }
    }
}

pub struct Site {
    pub config: SiteConfig,
    pub pages: Vec<Page>,
    pub template_engine: TemplateEngine,
    pub page_url_map: HashMap<PathBuf, String>, // Maps source content path to final URL
    pub navigation_menus: HashMap<String, Vec<NavEntry>>, // Stores grouped navigation entries
}

impl Site {
    pub fn new(config: SiteConfig) -> Result<Self, Box<dyn Error>> {
        let template_engine = TemplateEngine::new(&config.templates_dir)?;
        Ok(Site {
            config,
            pages: Vec::new(),
            template_engine,
            page_url_map: HashMap::new(),
            navigation_menus: HashMap::new(),
        })
    }

    /// Scans content directory, parses frontmatter and markdown, and populates `self.pages`.
    pub fn load_content(&mut self) -> Result<(), Box<dyn Error>> {
        info!("Loading content from: {:?}", self.config.content_dir);
        let content_dir = &self.config.content_dir;

        let mut raw_pages = Vec::new();
        for entry in fs::read_dir(content_dir)? {
            let entry = entry?;
            let path = entry.path();
            if path.is_file() && path.extension().map_or(false, |ext| ext == "md") {
                raw_pages.push(path);
            } else if path.is_dir() {
                // Recursively find markdown files in subdirectories
                self.walk_dir_for_markdown(&path, content_dir, &mut raw_pages)?;
            }
        }

        // Process pages in parallel
        let processed_pages: Vec<Page> = raw_pages.into_par_iter().filter_map(|path| {
            match self.process_single_content_file(&path, content_dir) {
                Ok(page) => Some(page),
                Err(e) => {
                    error!("Failed to process content file {:?}: {}", path, e);
                    None
                }
            }
        }).collect();

        for page in processed_pages {
            // Populate the URL map during initial load
            self.page_url_map.insert(page.relative_path.clone(), page.url_path.clone());
            self.pages.push(page);
        }

        info!("Loaded {} content pages.", self.pages.len());
        Ok(())
    }

    fn walk_dir_for_markdown(&self, dir: &Path, base_dir: &Path, files: &mut Vec<PathBuf>) -> Result<(), Box<dyn Error>> {
        for entry in fs::read_dir(dir)? {
            let entry = entry?;
            let path = entry.path();
            if path.is_file() && path.extension().map_or(false, |ext| ext == "md") {
                files.push(path);
            } else if path.is_dir() {
                self.walk_dir_for_markdown(&path, base_dir, files)?;
            }
        }
        Ok(())
    }

    fn process_single_content_file(&self, path: &Path, content_dir: &Path) -> Result<Page, Box<dyn Error>> {
        let content = fs::read_to_string(path)?;
        let (frontmatter, markdown_body) = processor::parse_frontmatter(&content)?;

        // Determine collection (e.g., "posts" for content/posts/my-post.md)
        let relative_path = path.strip_prefix(content_dir)?.to_path_buf();
        let collection = relative_path.parent()
                                      .and_then(|p| p.file_name())
                                      .and_then(|s| s.to_str())
                                      .unwrap_or("").to_string();

        // Generate URL path (e.g., /posts/my-post/ or /about/)
        let mut url_path = self.config.base_url.clone();
        if !collection.is_empty() {
            url_path.push_str(&collection);
            url_path.push('/');
        }
        url_path.push_str(&frontmatter.get_slug());
        url_path.push('/'); // Ensure trailing slash for directory-like URLs

        let (rendered_html, toc_entries) = processor::markdown_to_html(markdown_body);

        let mut page = Page::new(
            path.to_path_buf(),
            relative_path,
            frontmatter,
            markdown_body.to_string(),
            rendered_html,
            url_path,
            collection,
        );
        page.toc = toc_entries; // Assign the extracted ToC entries

        Ok(page)
    }

    /// Performs a second pass to resolve internal links in all pages.
    pub fn resolve_internal_links(&mut self) {
        info!("Resolving internal links for all pages...");
        // This needs to happen *after* all pages are loaded and `page_url_map` is complete.
        for page in &mut self.pages {
            page.final_html = processor::rewrite_internal_links(&page.rendered_html, &self.page_url_map);
        }
        info!("Internal links resolved.");
    }

    /// Builds the hierarchical navigation menus based on page frontmatter.
    pub fn build_navigation_menus(&mut self) {
        info!("Building navigation menus...");
        let mut menu_map: HashMap<String, Vec<NavEntry>> = HashMap::new();

        // First, collect all pages that belong to any menu
        for page in &self.pages {
            if page.frontmatter.draft {
                continue; // Skip draft pages from navigation
            }
            for menu_name in &page.frontmatter.menu {
                let entry = NavEntry::new(
                    page.frontmatter.title.clone(),
                    page.url_path.clone(),
                    page.frontmatter.weight,
                    Some(menu_name.clone()),
                );
                menu_map.entry(menu_name.clone()).or_default().push(entry);
            }
        }

        // Sort each menu by weight and title
        for (_menu_name, entries) in menu_map.iter_mut() {
            entries.sort_by(|a, b| a.weight.cmp(&b.weight).then_with(|| a.title.cmp(&b.title)));
            // For now, we're not building a deep hierarchy based on URL paths.
            // A more advanced system would parse URL paths to create nested `NavEntry` children.
            // For example, /docs/chapter1/ and /docs/chapter1/section1/
            // would result in chapter1 having section1 as a child.
            // This is a complex task and usually requires explicit parent/child metadata in frontmatter
            // or a very specific URL structure. We'll keep it flat for now.
        }

        self.navigation_menus = menu_map;
        info!("Navigation menus built: {:?}", self.navigation_menus.keys().collect::<Vec<&String>>());
    }


    /// Renders all pages using Tera templates.
    pub fn render_pages(&mut self) -> Result<(), Box<dyn Error>> {
        info!("Rendering {} pages...", self.pages.len());
        fs::create_dir_all(&self.config.output_dir)?;

        for page in &self.pages {
            if page.frontmatter.draft {
                info!("Skipping draft page: {:?}", page.file_path);
                continue;
            }

            let output_path = self.config.output_dir.join(
                page.url_path.trim_start_matches(&self.config.base_url)
                              .trim_end_matches('/')
            ).join("index.html"); // Ensure output is always index.html inside a directory

            fs::create_dir_all(output_path.parent().unwrap())?;

            let mut context = tera::Context::new();
            context.insert("page", &page);
            context.insert("site_config", &self.config);
            context.insert("current_url_path", &page.url_path);

            // Add navigation menus to the context
            context.insert("navigation", &self.navigation_menus);

            // Conditionally add ToC to context
            if page.frontmatter.show_table_of_contents && !page.toc.is_empty() {
                context.insert("toc", &page.toc);
            } else {
                context.insert("toc", &Vec::<NavEntry>::new()); // Empty ToC
            }

            let template_name = page.frontmatter.template.as_ref().unwrap_or(&"default.html".to_string()).clone();
            
            match self.template_engine.render(&template_name, &context) {
                Ok(rendered_content) => {
                    fs::write(&output_path, rendered_content)?;
                    info!("Rendered: {:?} -> {:?}", page.file_path, output_path);
                }
                Err(e) => {
                    error!("Failed to render page {:?} with template {}: {}", page.file_path, template_name, e);
                }
            }
        }
        Ok(())
    }

    /// Copies static assets.
    pub fn copy_static_assets(&self) -> Result<(), Box<dyn Error>> {
        if let Some(static_dir) = &self.config.static_dir {
            if static_dir.exists() && static_dir.is_dir() {
                info!("Copying static assets from: {:?}", static_dir);
                let output_static_dir = &self.config.output_dir;

                for entry in walkdir::WalkDir::new(static_dir) {
                    let entry = entry?;
                    let path = entry.path();
                    if path.is_file() {
                        let relative_path = path.strip_prefix(static_dir)?;
                        let dest_path = output_static_dir.join(relative_path);
                        fs::create_dir_all(dest_path.parent().unwrap())?;
                        fs::copy(path, &dest_path)?;
                        info!("Copied static asset: {:?} -> {:?}", path, dest_path);
                    }
                }
                info!("Static assets copied.");
            } else {
                warn!("Static directory {:?} does not exist or is not a directory. Skipping.", static_dir);
            }
        }
        Ok(())
    }
}

Explanation:

Site struct:
- page_url_map: HashMap<PathBuf, String> is added to store the mapping from original content file paths (e.g., content/posts/my-post.md stripped to posts/my-post.md) to their final public URLs (e.g., /posts/my-post/). This map is built during load_content.
- navigation_menus: HashMap<String, Vec<NavEntry>> stores the structured navigation data, keyed by menu name (e.g., “main”, “sidebar”).
load_content:
- Now, after processing each page, we populate self.page_url_map with page.relative_path and page.url_path. This map is essential for rewrite_internal_links.
- The collection logic is refined to better determine the content type.
resolve_internal_links: This new method iterates through all loaded Page objects and calls processor::rewrite_internal_links on their rendered_html. The result is stored in page.final_html. This must be called after load_content has completed for all pages.
build_navigation_menus:
- This method iterates through all Page objects.
- For each page, if it’s not a draft and has menu entries in its frontmatter, a NavEntry is created.
- These entries are grouped by their menu_name (e.g., “main”, “sidebar”) into self.navigation_menus.
- Each menu’s entries are then sorted by weight and title.
- Note: This implementation creates a flat list of navigation items for each menu. Building a truly hierarchical navigation (e.g., Docs > Chapter 1 > Section A) would require more complex logic, potentially involving parent fields in frontmatter or a stricter URL structure parsing. We’ll keep it flat for simplicity in this chapter.
render_pages:
- Now passes page.final_html (the HTML with resolved links) to the Tera context instead of page.rendered_html.
- The navigation_menus HashMap is passed to Tera as navigation.
- The page.toc Vec<TocEntry> is passed as toc, conditionally based on show_table_of_contents frontmatter.

4. Update `main.rs` to Orchestrate the New Steps

src/main.rs

use env_logger::Env;
use std::error::Error;
use std::path::PathBuf;
use crate::site::{Site, SiteConfig}; // Import SiteConfig

mod content;
mod processor;
mod site;
mod template_engine;
mod utils;

fn main() -> Result<(), Box<dyn Error>> {
    // Initialize logging
    env_logger::Builder::from_env(Env::default().default_filter_or("info")).init();

    info!("Starting SSG build process...");

    let config = SiteConfig {
        content_dir: PathBuf::from("content"),
        output_dir: PathBuf::from("public"),
        templates_dir: PathBuf::from("templates"),
        static_dir: Some(PathBuf::from("static")),
        base_url: "/".to_string(),
    };

    let mut site = Site::new(config)?;

    // 1. Load all content and populate the page_url_map
    site.load_content()?;

    // 2. After all content is loaded and URLs are known, resolve internal links
    site.resolve_internal_links();

    // 3. Build navigation menus from all page frontmatter
    site.build_navigation_menus();

    // 4. Render pages with updated HTML (resolved links), ToC, and navigation data
    site.render_pages()?;

    // 5. Copy static assets
    site.copy_static_assets()?;

    info!("SSG build process completed successfully!");

    Ok(())
}

Explanation: The main function now explicitly calls site.resolve_internal_links() and site.build_navigation_menus() between load_content() and render_pages(), ensuring the necessary data is prepared before rendering.

5. Update Tera Templates

Now, let’s modify a base template (e.g., templates/default.html) to make use of the toc and navigation data.

templates/default.html

Create or update your templates/default.html (or base.html if you’re using template inheritance):

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>{{ page.frontmatter.title }} - My Rust SSG Site</title>
    <meta name="description" content="{{ page.frontmatter.description | default(value='') }}">
    <link rel="stylesheet" href="/styles.css"> {# Example static asset #}
</head>
<body>
    <header>
        <h1><a href="{{ site_config.base_url }}">My Awesome Rust SSG</a></h1>
        <nav class="main-nav">
            <ul>
                {% if navigation.main %}
                    {% for nav_item in navigation.main %}
                        <li><a href="{{ nav_item.url }}">{{ nav_item.title }}</a></li>
                    {% endfor %}
                {% endif %}
            </ul>
        </nav>
        {% if navigation.sidebar %}
            <nav class="sidebar-nav">
                <h3>Docs</h3>
                <ul>
                    {% for nav_item in navigation.sidebar %}
                        <li><a href="{{ nav_item.url }}">{{ nav_item.title }}</a></li>
                    {% endfor %}
                </ul>
            </nav>
        {% endif %}
    </header>

    <main>
        <article>
            <h2>{{ page.frontmatter.title }}</h2>
            {% if page.frontmatter.author %}
                <p>By: {{ page.frontmatter.author }}</p>
            {% endif %}
            {% if page.frontmatter.date %}
                <p>Published: {{ page.frontmatter.date | date(format="%Y-%m-%d") }}</p>
            {% endif %}

            {% if page.frontmatter.show_table_of_contents and toc %}
                <aside class="table-of-contents">
                    <h3>Table of Contents</h3>
                    <ul>
                        {% for entry in toc %}
                            <li class="toc-level-{{ entry.level }}">
                                <a href="#{{ entry.id }}">{{ entry.text }}</a>
                            </li>
                        {% endfor %}
                    </ul>
                </aside>
            {% endif %}

            <div class="content-body">
                {{ page.final_html | safe }} {# Render the HTML with resolved links #}
            </div>
        </article>
    </main>

    <footer>
        <p>&copy; 2026 My Rust SSG. All rights reserved.</p>
    </footer>
</body>
</html>

Explanation:

Navigation: We iterate over navigation.main and navigation.sidebar (or any other menu names you define in frontmatter) to create menu links.
Table of Contents: If page.frontmatter.show_table_of_contents is true and toc data exists, we render an unordered list where each TocEntry becomes a list item with an anchor link to its id. The toc-level-X class can be used for styling (indentation).
Page Content: We now render {{ page.final_html | safe }} which contains the HTML with all internal links correctly resolved. The | safe filter is crucial to prevent Tera from escaping the HTML.

6. Example Content Files

To test this, create some content files:

content/posts/first-post.md

+++
title = "My First Blog Post"
date = 2026-03-01T10:00:00Z
description = "This is my very first post on the Rust SSG."
slug = "my-first-post"
menu = ["main"]
weight = 10
show_table_of_contents = true
+++

# Welcome to My Blog

This is the introductory section of my first blog post. It's great to be here.

## Getting Started

To get started, you might want to read about [Another Post](/posts/second-post.md).
You can also check out our [About Us](/about/) page.

### Installation

Follow these steps:
1.  Install Rust.
2.  Clone the repository.

## Advanced Topics

We'll cover more advanced topics later.

### Debugging Tips

Some useful debugging tips.

content/posts/second-post.md

+++
title = "My Second Blog Post"
date = 2026-03-02T11:30:00Z
description = "A follow-up post."
slug = "second-post"
menu = ["main"]
weight = 20
+++

# Second Post

This is the second post. You can go back to the [First Post](/posts/first-post.md).

content/about.md

+++
title = "About Us"
date = 2026-01-15T09:00:00Z
description = "Learn about our mission."
slug = "about"
menu = ["main", "sidebar"]
weight = 5
+++

# About Our Project

This static site generator is built with Rust.

## Our Mission

To provide developers with a powerful and flexible tool.

## The Team

We are a small, dedicated team.

Testing This Component

Create the directory structure:

.
├── Cargo.toml
├── src
│   ├── main.rs
│   ├── content.rs
│   ├── processor.rs
│   ├── site.rs
│   ├── template_engine.rs
│   └── utils.rs
├── content
│   ├── posts
│   │   ├── first-post.md
│   │   └── second-post.md
│   └── about.md
├── templates
│   └── default.html
└── static
    └── styles.css # Create this file, can be empty or have basic CSS

Add styles.css:

/* static/styles.css */
body { font-family: sans-serif; line-height: 1.6; margin: 0 auto; max-width: 800px; padding: 20px; }
header, footer { text-align: center; background-color: #f4f4f4; padding: 10px 0; margin-bottom: 20px; }
nav ul { list-style: none; padding: 0; }
nav ul li { display: inline; margin-right: 15px; }
.table-of-contents { border: 1px solid #eee; padding: 15px; background-color: #f9f9f9; margin-bottom: 20px; }
.table-of-contents ul { list-style: none; padding-left: 0; }
.table-of-contents .toc-level-2 { padding-left: 15px; }
.table-of-contents .toc-level-3 { padding-left: 30px; }
.content-body h1, .content-body h2, .content-body h3, .content-body h4, .content-body h5, .content-body h6 {
    margin-top: 2em;
    border-bottom: 1px solid #eee;
    padding-bottom: 0.3em;
}

Run the SSG:
```
cargo run
```
Check public directory:
- You should have public/posts/my-first-post/index.html, public/posts/second-post/index.html, and public/about/index.html.
- Open public/posts/my-first-post/index.html in your browser.
  - Verify the “Table of Contents” is present and its links (#welcome-to-my-blog, #getting-started, etc.) work.
  - Verify the internal links “Another Post” and “About Us” now point to /posts/second-post/ and /about/ respectively.
  - Check the global navigation links in the header.
- Open public/about/index.html. Verify its ToC and navigation.

Production Considerations

Error Handling for Broken Links:
- The current rewrite_internal_links logs a warning if a link cannot be resolved. In production, you might want to:
  - Fail the build if strict_linking = true is set in config.
  - Generate a 404 page for broken links (more complex, involves creating dummy pages).
  - Provide a build report summarizing all broken links.
- Improvement: Instead of just logging, collect all unresolvable links into a Vec<BrokenLink> and present them at the end of the build.
Performance Optimization:
- Link resolution involves iterating over all pages and using regex. For very large sites, this could become a bottleneck.
- Caching: If content files haven’t changed, their final_html (with resolved links) can be cached. This is part of incremental builds we’ll cover later.
- The current pulldown_cmark processing is already efficient.
Security Considerations:
- ToC IDs are slugified from heading text. Ensure slugify handles all edge cases and doesn’t produce malicious IDs (e.g., cross-site scripting attempts if heading text contains <script> tags). pulldown_cmark_to_md::slugify is generally robust.
- When rewriting links, ensure that only known internal paths are rewritten. External links should remain untouched. The current regex targets href attributes, which is generally safe, but be mindful if custom link syntaxes are introduced.
Logging and Monitoring:
- Detailed info! and warn! logs are crucial for debugging during development and for build pipelines.
- For production, ensure logs are captured by your CI/CD system.

Code Review Checkpoint

At this point, we have:

Modified src/content.rs to include TocEntry, NavEntry, and updated Frontmatter and Page structs.
Updated src/processor.rs to:
- Extract ToC entries during Markdown-to-HTML conversion.
- Generate unique IDs for headings.
- Implement rewrite_internal_links to resolve *.md paths to their public URLs.
Updated src/site.rs to:
- Populate page_url_map during content loading.
- Add resolve_internal_links method to process all pages after initial load.
- Add build_navigation_menus method to create global navigation structures.
- Pass toc and navigation data to the Tera context in render_pages.
Updated src/main.rs to orchestrate these new build steps.
Updated templates/default.html to render the ToC and navigation menus.
Added pulldown-cmark-to-md and regex dependencies to Cargo.toml.

This introduces a clear multi-pass architecture where content is initially parsed, then a global context (like page_url_map) is built, and finally, dependent features like link resolution and navigation generation are executed before the final rendering.

Common Issues & Solutions

Issue: Internal links are not being rewritten, or lead to 404 errors.
- Cause:
  - The link in Markdown doesn’t match the expected pattern (e.g., not ending with .md or not correctly formed relative path).
  - The target content file is missing or its relative_path doesn’t match the page_url_map key.
  - resolve_internal_links is not called, or called before load_content has finished populating page_url_map.
- Solution:
  - Double-check Markdown link syntax.
  - Verify the target file exists and its Frontmatter.slug (or derived slug) correctly forms its url_path.
  - Ensure the build order in main.rs is correct: load_content -> resolve_internal_links -> render_pages.
  - Check warn! logs from rewrite_internal_links.
Issue: Table of Contents is empty or missing.
- Cause:
  - show_table_of_contents is false in frontmatter.
  - The page has no headings (H1-H6).
  - page.toc is not being correctly populated in processor::markdown_to_html.
  - toc is not passed to the Tera context, or the Tera template logic is incorrect.
- Solution:
  - Set show_table_of_contents = true in your frontmatter.
  - Add headings to your Markdown content.
  - Debug processor::markdown_to_html to ensure toc_entries is being populated.
  - Verify render_pages correctly inserts toc into the Tera context and default.html iterates over it.
Issue: Navigation menus are empty or incorrectly ordered.
- Cause:
  - Pages are missing menu = ["menu_name"] in their frontmatter.
  - weight is not set or is incorrect, leading to unexpected sorting.
  - build_navigation_menus is not called, or called too early.
  - The Tera template iterates over the wrong menu name (e.g., navigation.main vs navigation.sidebar).
- Solution:
  - Add menu and weight to page frontmatter.
  - Check the info! logs from build_navigation_menus to see what menus were built.
  - Ensure main.rs calls build_navigation_menus before render_pages.
  - Inspect the Tera context during rendering if possible, or print navigation directly in the template for debugging.

Testing & Verification

To comprehensively test this chapter’s work:

Build the site: Run cargo run.
Inspect generated HTML:
- Open public/posts/my-first-post/index.html (and other generated pages) in a web browser.
- Verify ToC:
  - Check if the “Table of Contents” section is present.
  - Click on each ToC link to ensure it scrolls to the correct heading.
  - Inspect the HTML source to confirm headings have id attributes (e.g., <h2 id="getting-started">).
  - Check for correct indentation/styling based on heading levels.
- Verify Internal Links:
  - Click on all internal links within the content (e.g., “Another Post”, “About Us”). They should navigate to the correct generated URLs (e.g., /posts/second-post/, /about/) and not the raw Markdown paths.
  - Inspect the HTML source to confirm href attributes are rewritten (e.g., href="/posts/second-post/").
- Verify Navigation:
  - Check the global navigation in the header/sidebar.
  - Ensure all expected pages are listed and ordered correctly by weight.
  - Click on navigation links to confirm they lead to the correct pages.
Test edge cases:
- Create a page with no headings: It should not generate a ToC.
- Create a page with duplicate headings (e.g., two “Introduction” H2s): Their IDs should be unique (e.g., introduction and introduction-1).
- Create a page with a broken internal link (e.g., [missing](/non-existent.md)): Check the build logs for warnings or errors.
- Create a page without menu in frontmatter: It should not appear in navigation.

Summary & Next Steps

In this chapter, we’ve significantly upgraded our Rust SSG by adding fundamental features for site navigation and content organization. We’ve implemented:

Automatic Table of Contents generation by processing Markdown headings and injecting anchor IDs.
Robust internal link resolution that transforms Markdown references to their final public URLs, preventing broken links.
Dynamic navigation menu generation based on page frontmatter, allowing flexible site-wide navigation.

This marks a crucial step towards building a truly production-ready SSG, as these features greatly enhance user experience and content discoverability. The modular design allows us to extend these features further, for instance, by implementing hierarchical navigation or more sophisticated link validation.

In the next chapter, Chapter 14: Designing a Plugin or Extension System, we will explore how to make our SSG extensible. This will allow developers to add custom functionality, content transformations, or output formats without modifying the core codebase, paving the way for a highly adaptable content platform.

Chapter 13: Internal Linking, Navigation, and Table of Contents Generation

Table of Contents

Planning & Design

Component Architecture and Data Flow

File Structure Changes

Step-by-Step Implementation

1. Enhance Page and Frontmatter Structures

2. Implement Table of Contents (ToC) Generation

3. Update Site and Build Pipeline for Link Resolution and Navigation

4. Update main.rs to Orchestrate the New Steps

5. Update Tera Templates

6. Example Content Files

Testing This Component

Production Considerations

Code Review Checkpoint

Common Issues & Solutions

Testing & Verification

Summary & Next Steps

1. Enhance `Page` and `Frontmatter` Structures

3. Update `Site` and Build Pipeline for Link Resolution and Navigation

4. Update `main.rs` to Orchestrate the New Steps