Introduction

Welcome to Chapter 7! In the previous chapters, we laid the foundational groundwork for our Mermaid analysis tool by building a robust Lexer to tokenize input, a Parser to construct a strongly typed Abstract Syntax Tree (AST), and a Validator to perform initial syntax and semantic checks. With a validated AST in hand, we now move to the core of our linter and fixer: the Rule Engine.

This chapter is dedicated to designing and implementing a deterministic rule engine that can traverse our AST, identify potential issues (linting), and, if configured, apply safe, minimal, and reversible fixes directly to the AST. This engine will encapsulate our tool’s “intelligence” for enforcing Mermaid best practices and correcting common mistakes. We will define a Rule trait, allowing us to create modular and extensible checks and transformations. Our goal is to ensure that all fixes strictly adhere to Mermaid syntax specifications, never introducing invalid constructs or ambiguity.

By the end of this chapter, our tool will be able to operate in different modes: lint (report issues), fix (apply safe fixes), and strict (apply only guaranteed safe fixes, failing on any ambiguity). We will implement two foundational rules: one to ensure a graph declaration is present and another to normalize arrow syntax. This brings us significantly closer to a production-ready Mermaid compiler-like utility.

Planning & Design

The Rule Engine is where we move beyond mere validation to active analysis and transformation. It needs to be flexible enough to accommodate various types of rules (from simple style checks to structural modifications) while maintaining strict determinism and safety.

Core Principles

  1. Modularity: Each linting or fixing concern should be encapsulated in its own Rule implementation.
  2. Determinism: Given the same input AST and rule set, the output AST and diagnostics must always be identical. Rule application order will be carefully managed.
  3. Safety: Fixes must be guaranteed to produce valid Mermaid. Ambiguous or potentially destructive fixes are forbidden, especially in strict mode.
  4. Idempotence: Applying the rule engine multiple times to an already corrected AST should yield no further changes.
  5. Reversibility (Conceptual): While we won’t implement an undo feature, the fixes should be minimal and easily understandable, making manual reversal straightforward if needed.

Architecture of the Rule Engine

The rule engine will operate on the AST produced by the parser and validated by the semantic analyzer. It will consist of a manager that orchestrates the application of multiple rules. Each rule will implement a common Rule trait, defining methods for checking and fixing the AST.

Here’s a high-level architecture diagram for our Rule Engine:

flowchart TD AST_Input[Input AST] --> RE_Manager[Rule Engine Manager] subgraph Rule_Application_Loop["Rule Application Loop"] RE_Manager --> RE_Check_Pass[Check Pass (All Rules)] RE_Check_Pass --> RE_Fix_Pass[Fix Pass (Applicable Rules)] RE_Fix_Pass --> RE_Repeat{Fixes Applied?} RE_Repeat -->|Yes| RE_Check_Pass RE_Repeat -->|No| RE_Done[Processing Complete] end RE_Check_Pass --> Diag_Gen[Generate Diagnostics] RE_Fix_Pass --> AST_Modify[Modify AST] Diag_Gen --> Output_Diags[Output Diagnostics] AST_Modify --> AST_Output[Output Modified AST] subgraph Individual_Rule_Structure["Individual Rule Structure (e.g., Missing Graph Decl.)"] R_Trait[Rule Trait] R_Check[check() Method] R_Fix[apply_fix() Method] R_Trait -.-> R_Check R_Trait -.-> R_Fix end RE_Check_Pass -.-> R_Check RE_Fix_Pass -.-> R_Fix

Explanation:

  • Rule Trait: Defines the interface for all rules. It will include methods like name(), check(), and apply_fix().
  • RuleEngine (Manager): This struct will hold a collection of Box<dyn Rule> instances. It will provide methods to run linting and fixing passes.
  • Check Pass: In this phase, all registered rules iterate over the AST to identify issues and generate Diagnostic messages. No modifications are made.
  • Fix Pass: If in fix or strict mode, rules will attempt to modify the AST. This pass might be repeated until no more changes occur (idempotence).
  • Diagnostics: All rules will contribute to a shared Diagnostics collection, which will then be reported to the user.
  • Modes: The RuleEngine will adapt its behavior based on the requested mode (lint, fix, strict).

File Structure

We’ll introduce a new module src/rules to house our rule definitions and the engine itself.

src/
├── main.rs
├── lexer/
│   └── ...
├── parser/
│   └── ...
├── ast/
│   └── ...
├── validator/
│   └── ...
├── diagnostics/
│   └── ...
└── rules/
    ├── mod.rs             // Defines the Rule trait and RuleEngine struct
    ├── common.rs          // Utility functions or shared types for rules
    ├── missing_graph_decl.rs // Rule for missing graph declaration
    └── arrow_normalization.rs // Rule for arrow syntax normalization

Step-by-Step Implementation

1. Setup Rule Module and Trait

First, let’s create the src/rules directory and define the Rule trait in src/rules/mod.rs. This trait will be the cornerstone for all our linting and fixing logic.

a) Create src/rules/mod.rs:

// src/rules/mod.rs

use crate::ast::MermaidAst;
use crate::diagnostics::{Diagnostic, DiagnosticLevel, Diagnostics};
use std::fmt::Debug;

// Re-export specific rules for easier access
pub mod missing_graph_decl;
pub mod arrow_normalization;

/// Defines the operation mode for the rule engine.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum FixMode {
    /// Only report diagnostics, no modifications.
    LintOnly,
    /// Apply safe, reversible fixes.
    ApplyFixes,
    /// Apply only guaranteed safe and minimal fixes, fail on any ambiguity.
    StrictFixes,
}

/// The trait that all linting and fixing rules must implement.
///
/// Rules are designed to be deterministic and operate on the AST.
/// They can produce diagnostics and, in fixing modes, modify the AST.
pub trait Rule: Debug + Send + Sync {
    /// Returns the unique name of the rule.
    fn name(&self) -> &'static str;

    /// Checks the AST for issues and adds diagnostics.
    /// This method should NOT modify the AST.
    fn check(&self, ast: &MermaidAst, diagnostics: &mut Diagnostics);

    /// Attempts to apply a fix to the AST.
    /// Returns `true` if any modification was made, `false` otherwise.
    /// This method should only be called in `ApplyFixes` or `StrictFixes` modes.
    /// It should also add diagnostics for the applied fix.
    fn apply_fix(&self, ast: &mut MermaidAst, diagnostics: &mut Diagnostics) -> bool;

    /// Indicates which fix modes this rule supports.
    /// By default, rules support `ApplyFixes`. Override for `StrictFixes`.
    fn supports_fix_mode(&self, mode: FixMode) -> bool {
        match mode {
            FixMode::LintOnly => true, // All rules can lint
            FixMode::ApplyFixes => true,
            FixMode::StrictFixes => false, // By default, rules are not strict-fix safe
        }
    }
}

/// The Rule Engine orchestrates the application of various rules.
#[derive(Default)]
pub struct RuleEngine {
    rules: Vec<Box<dyn Rule>>,
}

impl RuleEngine {
    /// Creates a new RuleEngine with no rules registered.
    pub fn new() -> Self {
        Self { rules: Vec::new() }
    }

    /// Registers a new rule with the engine.
    pub fn register_rule(&mut self, rule: Box<dyn Rule>) {
        self.rules.push(rule);
    }

    /// Runs the rule engine in lint-only mode.
    /// It checks the AST against all registered rules and collects diagnostics.
    pub fn run_lint(&self, ast: &MermaidAst) -> Diagnostics {
        let mut diagnostics = Diagnostics::new();
        for rule in &self.rules {
            rule.check(ast, &mut diagnostics);
        }
        diagnostics
    }

    /// Runs the rule engine to apply fixes to the AST.
    ///
    /// It performs multiple passes until no more fixes can be applied by any rule
    /// or a maximum number of passes is reached to prevent infinite loops.
    ///
    /// Returns the collected diagnostics, including those for applied fixes.
    pub fn run_fix(&self, ast: &mut MermaidAst, fix_mode: FixMode) -> Diagnostics {
        let mut diagnostics = Diagnostics::new();
        if fix_mode == FixMode::LintOnly {
            // Fallback to lint-only if fix mode is incorrectly passed
            return self.run_lint(ast);
        }

        const MAX_FIX_PASSES: usize = 10; // Prevent infinite loops
        let mut changed_in_last_pass = true;
        let mut pass_count = 0;

        while changed_in_last_pass && pass_count < MAX_FIX_PASSES {
            changed_in_last_pass = false;
            pass_count += 1;

            // First, run a check pass to gather all current diagnostics
            // This is important because applying a fix might introduce new issues
            // or resolve existing ones that other rules might detect.
            // We clear and re-populate diagnostics in each pass.
            let mut current_pass_diagnostics = Diagnostics::new();
            for rule in &self.rules {
                rule.check(ast, &mut current_pass_diagnostics);
            }
            // Merge current pass diagnostics into the main diagnostics object
            diagnostics.extend(current_pass_diagnostics);

            // Then, attempt to apply fixes
            for rule in &self.rules {
                if rule.supports_fix_mode(fix_mode) {
                    if rule.apply_fix(ast, &mut diagnostics) {
                        changed_in_last_pass = true;
                    }
                }
            }
        }

        if changed_in_last_pass {
            diagnostics.add_diagnostic(Diagnostic {
                level: DiagnosticLevel::Warning,
                code: "R000_FIX_LIMIT".to_string(),
                message: format!("Reached maximum fix passes ({}). Some issues might remain. Please re-run the tool.", MAX_FIX_PASSES),
                span: None,
                help: Some("This usually indicates a complex interaction between rules or a rule that isn't fully idempotent. Consider manual inspection or re-running the tool.".to_string()),
            });
        }
        diagnostics
    }

    /// Helper to register all default rules.
    pub fn register_default_rules(&mut self) {
        self.register_rule(Box::new(missing_graph_decl::MissingGraphDeclarationRule));
        self.register_rule(Box::new(arrow_normalization::ArrowNormalizationRule));
        // Register other rules here as they are implemented
    }
}

Explanation of src/rules/mod.rs:

  • FixMode Enum: Defines the three operational modes for our rule engine.
  • Rule Trait:
    • name(): Provides a string identifier for the rule.
    • check(): This method is used for linting. It takes an immutable reference to the AST (&MermaidAst) and a mutable reference to Diagnostics. Rules add Diagnostic messages here.
    • apply_fix(): This method is used for fixing. It takes a mutable reference to the AST (&mut MermaidAst) and Diagnostics. It returns true if any change was made, false otherwise. This is crucial for the RuleEngine to know if another pass is needed.
    • supports_fix_mode(): Allows rules to declare their compatibility with ApplyFixes or StrictFixes. By default, rules are only safe for ApplyFixes.
  • RuleEngine Struct:
    • rules: A Vec<Box<dyn Rule>> to store all registered rules. Using Box<dyn Rule> allows for polymorphism, meaning we can store different concrete rule types that all implement the Rule trait.
    • new(): Constructor.
    • register_rule(): Adds a rule to the engine.
    • run_lint(): Executes all check() methods on the AST.
    • run_fix(): This is the more complex method. It iterates through rules, applying apply_fix() methods. It includes a MAX_FIX_PASSES loop to ensure idempotence and prevent infinite loops if rules interact in unexpected ways. It also re-runs check in each pass to ensure diagnostics are up-to-date with the modified AST.
    • register_default_rules(): A convenience method to easily register all built-in rules.

2. Implement the MissingGraphDeclarationRule

Many Mermaid diagrams implicitly assume a graph TD or flowchart TD declaration. While the Mermaid renderer often tolerates its absence, explicitly declaring it is a best practice for clarity and strictness. This rule will enforce that.

a) Create src/rules/missing_graph_decl.rs:

// src/rules/missing_graph_decl.rs

use super::{Rule, FixMode};
use crate::ast::{MermaidAst, Statement, GraphDeclaration, GraphOrientation};
use crate::diagnostics::{Diagnostic, DiagnosticLevel, Diagnostics};
use crate::span::Span; // Assuming Span is defined and used for locations

/// Rule to ensure that a Mermaid diagram starts with a graph or flowchart declaration.
/// If missing, it adds a default 'graph TD' declaration.
#[derive(Debug)]
pub struct MissingGraphDeclarationRule;

impl Rule for MissingGraphDeclarationRule {
    fn name(&self) -> &'static str {
        "missing-graph-declaration"
    }

    fn check(&self, ast: &MermaidAst, diagnostics: &mut Diagnostics) {
        if !ast.statements.iter().any(|stmt| matches!(stmt, Statement::GraphDeclaration(_))) {
            diagnostics.add_diagnostic(Diagnostic {
                level: DiagnosticLevel::Warning,
                code: "R701_MISSING_GRAPH_DECL".to_string(),
                message: "Mermaid diagram is missing a 'graph' or 'flowchart' declaration.".to_string(),
                span: ast.statements.first().map(|s| s.span()), // Point to the start of the file
                help: Some("Consider adding 'graph TD' or 'flowchart TD' at the beginning of your diagram for clarity and strict compliance.".to_string()),
            });
        }
    }

    fn apply_fix(&self, ast: &mut MermaidAst, diagnostics: &mut Diagnostics) -> bool {
        if !ast.statements.iter().any(|stmt| matches!(stmt, Statement::GraphDeclaration(_))) {
            // Create a default 'graph TD' declaration.
            // We'll use a dummy span as this is an inserted node.
            // For a real-world scenario, you might want to infer a more accurate span.
            let default_decl = Statement::GraphDeclaration(GraphDeclaration {
                kind: "graph".to_string(), // Use "graph" as the default
                orientation: Some(GraphOrientation::TD),
                span: Span::new(0, 0), // Dummy span, as it's inserted
            });

            // Insert at the beginning of the statements
            ast.statements.insert(0, default_decl);

            diagnostics.add_diagnostic(Diagnostic {
                level: DiagnosticLevel::Note,
                code: "R701_MISSING_GRAPH_DECL_FIXED".to_string(),
                message: "Added 'graph TD' declaration to the beginning of the diagram.".to_string(),
                span: Some(Span::new(0, 0)), // Point to the very beginning
                help: None,
            });
            true // Modification made
        } else {
            false // No modification needed
        }
    }

    fn supports_fix_mode(&self, mode: FixMode) -> bool {
        // This is a safe fix, so it can be applied in both ApplyFixes and StrictFixes modes.
        matches!(mode, FixMode::ApplyFixes | FixMode::StrictFixes)
    }
}

b) Update src/rules/mod.rs to include the new rule: We already added pub mod missing_graph_decl; and registered it in register_default_rules().

Explanation of MissingGraphDeclarationRule:

  • check(): It iterates through the AST’s top-level statements. If no Statement::GraphDeclaration is found, it adds a Warning diagnostic.
  • apply_fix(): If the declaration is missing, it creates a Statement::GraphDeclaration for graph TD and inserts it at the beginning of ast.statements. It returns true to indicate a change.
  • supports_fix_mode(): This fix is very safe and deterministic, so it supports StrictFixes.

3. Implement the ArrowNormalizationRule

Mermaid allows various arrow syntaxes (e.g., ---, ==>, --x), but for consistency and strictness, we might want to normalize them to a standard form like --> for simple edges. This rule will focus on standardizing simple directed arrows.

a) Create src/rules/arrow_normalization.rs:

// src/rules/arrow_normalization.rs

use super::{Rule, FixMode};
use crate::ast::{MermaidAst, Statement, Edge, EdgeArrow, EdgeLine, EdgeStyle};
use crate::diagnostics::{Diagnostic, DiagnosticLevel, Diagnostics};
use crate::span::Span;

/// Rule to normalize various arrow syntaxes to a standard form (e.g., `-->`).
/// This rule primarily targets simple directed arrows.
#[derive(Debug)]
pub struct ArrowNormalizationRule;

impl Rule for ArrowNormalizationRule {
    fn name(&self) -> &'static str {
        "arrow-normalization"
    }

    fn check(&self, ast: &MermaidAst, diagnostics: &mut Diagnostics) {
        // This rule primarily identifies non-standard arrows that _could_ be normalized.
        // It iterates through the AST to find edges and checks their arrow styles.
        for statement in &ast.statements {
            if let Statement::Edge(edge) = statement {
                if let Some(ref edge_style) = edge.style {
                    // Check for non-standard directed arrows that can be normalized to '-->'
                    let is_non_standard_arrow = matches!(
                        (edge_style.line, edge_style.arrow_start, edge_style.arrow_end),
                        (EdgeLine::Solid, EdgeArrow::None, EdgeArrow::Arrow) | // e.g., "---"
                        (EdgeLine::Dotted, EdgeArrow::None, EdgeArrow::Arrow) // e.g., "-.->"
                    );

                    if is_non_standard_arrow {
                        diagnostics.add_diagnostic(Diagnostic {
                            level: DiagnosticLevel::Warning,
                            code: "R702_NON_STANDARD_ARROW".to_string(),
                            message: format!("Non-standard arrow syntax detected: '{}'.", edge.raw_arrow()),
                            span: Some(edge.span()),
                            help: Some("Consider normalizing to '-->' for consistency.".to_string()),
                        });
                    }
                }
            }
        }
    }

    fn apply_fix(&self, ast: &mut MermaidAst, diagnostics: &mut Diagnostics) -> bool {
        let mut changed = false;
        // We need to traverse the AST mutably. This can be complex for deeply nested structures.
        // For simplicity here, we iterate top-level statements. A full implementation
        // would likely use an AST visitor pattern to handle all levels.
        for statement in &mut ast.statements {
            if let Statement::Edge(edge) = statement {
                if let Some(ref mut edge_style) = edge.style {
                    // Apply fix for non-standard directed arrows to '-->'
                    let should_normalize = matches!(
                        (edge_style.line, edge_style.arrow_start, edge_style.arrow_end),
                        (EdgeLine::Solid, EdgeArrow::None, EdgeArrow::Arrow) | // e.g., "---"
                        (EdgeLine::Dotted, EdgeArrow::None, EdgeArrow::Arrow) // e.g., "-.->"
                    );

                    if should_normalize {
                        // Normalize to '-->'
                        edge_style.line = EdgeLine::Solid;
                        edge_style.arrow_start = EdgeArrow::None;
                        edge_style.arrow_end = EdgeArrow::Arrow;

                        diagnostics.add_diagnostic(Diagnostic {
                            level: DiagnosticLevel::Note,
                            code: "R702_ARROW_NORMALIZED".to_string(),
                            message: format!("Normalized arrow syntax to '-->'. Original was: '{}'", edge.raw_arrow()),
                            span: Some(edge.span()),
                            help: None,
                        });
                        changed = true;
                    }
                }
            }
        }
        changed
    }

    fn supports_fix_mode(&self, mode: FixMode) -> bool {
        // This fix is generally safe, but normalizing all arrows might not be
        // desired in all 'strict' contexts if the original syntax was valid.
        // For now, we'll keep it to ApplyFixes. A more advanced rule might
        // distinguish between strictly equivalent forms and stylistic choices.
        matches!(mode, FixMode::ApplyFixes)
    }
}

b) Update src/rules/mod.rs to include the new rule: We already added pub mod arrow_normalization; and registered it in register_default_rules().

Explanation of ArrowNormalizationRule:

  • check(): It iterates through Edge statements and checks their EdgeStyle. If it finds a simple directed arrow that isn’t --> (e.g., --- or -.->), it adds a Warning.
  • apply_fix(): For identified non-standard arrows, it modifies the EdgeStyle to represent --> (solid line, no start arrow, end arrow). It returns true if a modification occurred.
  • supports_fix_mode(): This rule is set to ApplyFixes only. While normalizing --- to --> is generally safe, -.-> has a different visual meaning (dotted line). For StrictFixes, we might only allow normalization if the meaning is identical. This highlights the trade-off in strictness.

4. Integrate Rule Engine into the Main Application Flow

Now, let’s update src/main.rs to utilize the RuleEngine after parsing and initial validation. We’ll also need to add command-line arguments to select the operational mode.

a) Update src/main.rs:

We’ll use clap for command-line argument parsing. Add it to Cargo.toml if not already present:

# Cargo.toml
[dependencies]
clap = { version = "4", features = ["derive"] }
# ... other dependencies

Then, modify src/main.rs:

// src/main.rs

use clap::{Parser, Subcommand};
use std::fs;
use std::path::PathBuf;

// Import our modules
mod lexer;
mod parser;
mod ast;
mod diagnostics;
mod validator;
mod rules; // New!
mod span;

use lexer::Lexer;
use parser::Parser as MermaidParser; // Alias to avoid conflict with clap::Parser
use validator::AstValidator;
use diagnostics::DiagnosticLevel;
use rules::{RuleEngine, FixMode}; // New!

/// A strict linter and formatter for Mermaid diagrams.
#[derive(Parser, Debug)]
#[command(author, version, about = "Strict linter and formatter for Mermaid diagrams", long_about = None)]
struct Cli {
    /// Path to the Mermaid file to process
    #[arg(name = "FILE", help = "Path to the Mermaid file to process")]
    file: PathBuf,

    #[command(subcommand)]
    command: Commands,
}

#[derive(Subcommand, Debug)]
enum Commands {
    /// Checks the Mermaid file for errors and warnings (lint mode).
    Lint {
        /// Exit with a non-zero code if any errors or warnings are found.
        #[arg(long)]
        strict_exit: bool,
    },
    /// Attempts to fix common Mermaid syntax issues.
    Fix {
        /// Overwrite the original file with the fixed content.
        #[arg(long)]
        write: bool,
        /// Apply only guaranteed safe and minimal fixes, fail on any ambiguity.
        #[arg(long)]
        strict: bool,
    },
    // Future: Add `format` command here.
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let cli = Cli::parse();
    let source_code = fs::read_to_string(&cli.file)?;

    let mut diagnostics = diagnostics::Diagnostics::new();

    // 1. Lexing
    let lexer = Lexer::new(&source_code);
    let tokens = lexer.collect::<Vec<_>>();
    // Note: Lexer errors are typically collected during tokenization if designed that way.
    // For now, we assume a successful tokenization or handle basic errors in parser.

    // 2. Parsing
    let mut parser = MermaidParser::new(tokens);
    let mut ast = match parser.parse() {
        Ok(ast) => ast,
        Err(parse_diagnostics) => {
            diagnostics.extend(parse_diagnostics);
            diagnostics.print_diagnostics(&source_code);
            if diagnostics.has_errors() {
                eprintln!("Error: Parsing failed. Exiting.");
                std::process::exit(1);
            }
            // If parser returns diagnostics but no AST, we can't proceed.
            // For now, assume it returns an AST even with errors for further processing.
            // A robust parser would return Result<Option<Ast>, Diagnostics>
            // For this example, let's mock an empty AST if parsing fails completely
            return Err("Parsing failed and produced no AST. Cannot continue.".into());
        }
    };
    diagnostics.extend(parser.into_diagnostics()); // Collect parser-specific diagnostics

    // 3. Semantic Validation (from previous chapter)
    let validator = AstValidator::new();
    validator.validate(&ast, &mut diagnostics);

    // 4. Rule Engine (Linting and Fixing) - NEW
    let mut rule_engine = RuleEngine::new();
    rule_engine.register_default_rules(); // Register all our default rules

    match &cli.command {
        Commands::Lint { strict_exit } => {
            let lint_diagnostics = rule_engine.run_lint(&ast);
            diagnostics.extend(lint_diagnostics);

            diagnostics.print_diagnostics(&source_code);

            if *strict_exit && diagnostics.has_errors_or_warnings() {
                eprintln!("Linting failed with errors or warnings. Exiting strictly.");
                std::process::exit(1);
            } else if diagnostics.has_errors() {
                eprintln!("Linting failed with errors. Exiting.");
                std::process::exit(1);
            }
        }
        Commands::Fix { write, strict } => {
            let fix_mode = if *strict { FixMode::StrictFixes } else { FixMode::ApplyFixes };
            let fix_diagnostics = rule_engine.run_fix(&mut ast, fix_mode);
            diagnostics.extend(fix_diagnostics);

            // Re-validate after fixing to catch any new issues introduced by fixes (shouldn't happen with safe fixes)
            let mut post_fix_diagnostics = diagnostics::Diagnostics::new();
            validator.validate(&ast, &mut post_fix_diagnostics);
            diagnostics.extend(post_fix_diagnostics);

            diagnostics.print_diagnostics(&source_code);

            if diagnostics.has_errors() {
                eprintln!("Error: Fixes were applied, but new errors were detected or existing errors could not be resolved. Exiting.");
                std::process::exit(1);
            }

            // If fixes were applied and no errors, output the modified AST
            if diagnostics.has_fix_notes() || diagnostics.has_level(DiagnosticLevel::Note) { // Check for notes which indicate fixes
                println!("\n--- Fixed Mermaid Code ---");
                // TODO: For now, we'll just print a debug representation or a simplified string.
                // A proper formatter (Chapter 8) will convert AST back to clean Mermaid string.
                // For now, let's just show a placeholder or a simplistic AST to string.
                // This will be replaced by the Formatter in the next chapter.
                println!("{:?}", ast); // Placeholder
                println!("--------------------------");

                if *write {
                    // This is where the formatter (Chapter 8) would convert `ast` back to a string
                    // For now, we'll use a crude representation or just indicate overwrite.
                    eprintln!("`--write` option specified. Original file would be overwritten with formatted content.");
                    eprintln!("(Actual file overwrite is pending implementation of the Formatter in Chapter 8.)");
                    // fs::write(&cli.file, ast_to_string_representation)?; // This will be the actual call
                }
            } else {
                println!("\nNo fixes applied or no issues found that could be fixed.");
            }
        }
    }

    Ok(())
}

Explanation of src/main.rs changes:

  • clap integration: Added Cli and Commands to define lint and fix subcommands with their respective options (--strict-exit, --write, --strict).
  • RuleEngine instantiation: RuleEngine::new() and rule_engine.register_default_rules() are called.
  • Command Handling:
    • Commands::Lint: Calls rule_engine.run_lint(). Diagnostics are printed, and strict_exit controls the exit code.
    • Commands::Fix: Determines FixMode based on the --strict flag. Calls rule_engine.run_fix().
    • Post-Fix Validation: Crucially, after fixes are applied, the AST is re-validated to ensure fixes didn’t introduce new errors.
    • Output: If fixes are applied, a placeholder message is printed. The actual conversion of the modified AST back to a Mermaid string will be handled by the Formatter in Chapter 8. The --write option is acknowledged but not fully implemented yet.

5. Update Diagnostics to include Note level for fixes

We used DiagnosticLevel::Note for reporting successful fixes. Ensure our Diagnostics system (from Chapter 5) can handle and print Note level messages.

a) Update src/diagnostics.rs (if necessary):

Ensure DiagnosticLevel enum includes Note and print_diagnostics handles it.

// src/diagnostics.rs (Relevant parts, assuming previous implementation)

#[derive(Debug, Clone, PartialEq, Eq, Copy)]
pub enum DiagnosticLevel {
    Error,
    Warning,
    Note, // Added this
    Help, // Used as a sub-message, not a primary level
}

// ... other Diagnostic struct and methods ...

impl Diagnostics {
    // ... existing methods ...

    pub fn has_errors_or_warnings(&self) -> bool {
        self.diagnostics.iter().any(|d| d.level == DiagnosticLevel::Error || d.level == DiagnosticLevel::Warning)
    }

    pub fn has_errors(&self) -> bool {
        self.diagnostics.iter().any(|d| d.level == DiagnosticLevel::Error)
    }

    pub fn has_warnings(&self) -> bool {
        self.diagnostics.iter().any(|d| d.level == DiagnosticLevel::Warning)
    }

    pub fn has_fix_notes(&self) -> bool {
        self.diagnostics.iter().any(|d| d.level == DiagnosticLevel::Note && d.code.contains("FIXED"))
    }

    pub fn has_level(&self, level: DiagnosticLevel) -> bool {
        self.diagnostics.iter().any(|d| d.level == level)
    }

    pub fn print_diagnostics(&self, source_code: &str) {
        use colored::Colorize; // Assuming you have 'colored' crate for nice output

        for diag in self.diagnostics.iter().filter(|d| d.level != DiagnosticLevel::Help) { // Don't print Help as primary
            let level_str = match diag.level {
                DiagnosticLevel::Error => "error".red().bold(),
                DiagnosticLevel::Warning => "warning".yellow().bold(),
                DiagnosticLevel::Note => "note".blue().bold(), // Handle Note
                DiagnosticLevel::Help => continue, // Should not be printed as primary
            };

            let code_str = diag.code.bright_black();

            eprintln!("{}: {} [{}]", level_str, diag.message, code_str);

            if let Some(span) = diag.span {
                let lines: Vec<&str> = source_code.lines().collect();
                if let Some(line_num) = span.start_line(source_code) {
                    eprintln!("     {} {}:{}:{}", "-->".blue().bold(), cli_args.file.display(), line_num + 1, span.start_col(source_code) + 1);

                    // Print context line
                    if let Some(line_content) = lines.get(line_num) {
                        eprintln!(" {} | {}", (line_num + 1).to_string().blue(), line_content);
                        // Highlight the exact span
                        eprintln!("   {} {}{}", " ".blue(), " ".repeat(span.start_col(source_code)), "^".repeat(span.len()).green().bold());
                    }
                }
            }

            if let Some(ref help) = diag.help {
                eprintln!("     {} {}", "help:".cyan().bold(), help);
            }
            eprintln!(); // Blank line for separation
        }
    }
}

Self-correction: The print_diagnostics function needs cli_args.file.display() which is not directly accessible from Diagnostics. For a production-ready tool, Diagnostics would likely take a &Path to the file or the main function would handle printing the file path. For now, I’ll simplify the file display in the diagnostic output for this example.

Let’s refine the diagnostic printing slightly within src/diagnostics.rs to take the file path as an argument.

// src/diagnostics.rs (Corrected print_diagnostics signature)

use colored::Colorize;
use std::path::Path;

// ... DiagnosticLevel and Diagnostic struct ...

impl Diagnostics {
    // ... existing methods ...

    pub fn print_diagnostics(&self, source_code: &str, file_path: &Path) { // Added file_path
        for diag in self.diagnostics.iter().filter(|d| d.level != DiagnosticLevel::Help) {
            let level_str = match diag.level {
                DiagnosticLevel::Error => "error".red().bold(),
                DiagnosticLevel::Warning => "warning".yellow().bold(),
                DiagnosticLevel::Note => "note".blue().bold(),
                DiagnosticLevel::Help => continue,
            };

            let code_str = diag.code.bright_black();

            eprintln!("{}: {} [{}]", level_str, diag.message, code_str);

            if let Some(span) = diag.span {
                let lines: Vec<&str> = source_code.lines().collect();
                if let Some(line_num) = span.start_line(source_code) {
                    eprintln!("     {} {}:{}:{}", "-->".blue().bold(), file_path.display(), line_num + 1, span.start_col(source_code) + 1);

                    if let Some(line_content) = lines.get(line_num) {
                        eprintln!(" {} | {}", (line_num + 1).to_string().blue(), line_content);
                        eprintln!("   {} {}{}", " ".blue(), " ".repeat(span.start_col(source_code)), "^".repeat(span.len()).green().bold());
                    }
                }
            }

            if let Some(ref help) = diag.help {
                eprintln!("     {} {}", "help:".cyan().bold(), help);
            }
            eprintln!();
        }
    }
}

And update the main function call:

// src/main.rs (Relevant part in main function)
            diagnostics.print_diagnostics(&source_code, &cli.file);

Production Considerations

  1. Rule Ordering: The order in which rules are applied can matter significantly. Some rules might create conditions that other rules then fix, or vice versa. The current RuleEngine applies rules in the order they are registered. For a complex tool, a dependency graph for rules or a fixed, documented ordering might be necessary to ensure deterministic behavior and optimal performance.
  2. Performance & AST Traversal: Rules often involve traversing the AST. For very large Mermaid diagrams, inefficient traversal (e.g., repeated full scans) can be slow. Using a dedicated AST visitor pattern (which can be implemented once and used by all rules) can optimize traversal. Our current rules iterate top-level statements; for deeply nested structures (like subgraphs), a recursive visitor would be essential.
  3. Error Handling within Rules: While the AST should be valid by the time rules run (thanks to the validator), rules should still be robust against unexpected AST structures. Using Option and Result types where appropriate is good practice.
  4. Idempotence and Fix Passes: The MAX_FIX_PASSES in run_fix is a safeguard against infinite loops. Ideally, each rule should be idempotent on its own, meaning applying it twice to the same AST yields no further changes. If MAX_FIX_PASSES is hit, it indicates a potential issue with rule interactions or a non-idempotent rule.
  5. Strictness Levels: The FixMode enum allows fine-grained control over how aggressive fixes are. This is crucial for a production tool where users might have varying tolerances for automatic modifications. The StrictFixes mode ensures that only truly unambiguous and safe changes are made.
  6. Extensibility: The Rule trait design makes it easy to add new rules without modifying the RuleEngine itself, promoting extensibility and maintainability.
  7. Logging: In a production environment, logging which rules were applied and what changes were made (especially in fix mode) is invaluable for debugging and auditing. Our current DiagnosticLevel::Note serves this purpose for user-facing output.

Code Review Checkpoint

At this stage, we have accomplished the following:

  • Defined the Rule trait: This provides a clear, extensible interface for creating linting and fixing logic.
  • Implemented the RuleEngine struct: This orchestrates the application of rules, manages different FixModes, and handles multiple passes for fixes.
  • Created MissingGraphDeclarationRule: A concrete example of a rule that checks for and fixes the absence of a graph declaration.
  • Created ArrowNormalizationRule: A rule that identifies and normalizes non-standard arrow syntax.
  • Integrated the RuleEngine into src/main.rs: The CLI now supports lint and fix commands, leveraging the newly built engine.
  • Updated Diagnostics: To include Note level for reporting successful fixes and to accept a file path for better output.

Files Created/Modified:

  • src/rules/mod.rs (New directory and module)
  • src/rules/missing_graph_decl.rs (New rule file)
  • src/rules/arrow_normalization.rs (New rule file)
  • src/main.rs (Modified for CLI, RuleEngine integration)
  • src/diagnostics.rs (Modified for Note level and file_path in print_diagnostics)
  • Cargo.toml (Added clap dependency)

This new component significantly enhances our tool’s capabilities, moving it from a pure validator to an active linter and fixer.

Common Issues & Solutions

  1. Rules Conflict or Cause Infinite Loops:

    • Issue: Two rules might continuously “fix” each other’s changes, or a rule might not be idempotent, leading to MAX_FIX_PASSES being hit.
    • Solution:
      • Test Thoroughly: Unit tests for each rule and integration tests for rule sets are crucial.
      • Idempotency: Design each apply_fix method to make changes only if truly necessary and to produce the same AST if run repeatedly on an already fixed AST.
      • Rule Ordering: If conflicts arise, experimenting with rule registration order in RuleEngine::register_default_rules() can sometimes resolve them. For complex interactions, a dependency system for rules might be needed.
      • Clear Diagnostics: Ensure diagnostics for fixes clearly state what was changed, helping debug interactions.
      • Limit StrictFixes: Rules that might lead to conflicts or non-idempotent behavior should not support StrictFixes.
  2. Performance Degradation with Many Rules/Large ASTs:

    • Issue: Each rule traversing the entire AST independently can become slow for large diagrams or many rules.
    • Solution:
      • Optimized Traversal: Implement a generic AST visitor pattern. Rules can then subscribe to specific AST node types, and the visitor traverses the AST once, calling relevant rule methods for each node.
      • Lazy Evaluation: Some rules might only need to run if certain conditions are met, avoiding unnecessary work.
      • Caching: If rules frequently query the AST for similar information, cache results for performance.
  3. Fixes Introduce New Syntax Errors:

    • Issue: A poorly designed apply_fix method might transform a valid (though non-standard) AST into an invalid one.
    • Solution:
      • Post-Fix Validation: As implemented in main.rs, always re-run the semantic validator after applying fixes. If new errors appear, report them and potentially revert changes (though reverting is complex and outside the scope of this project).
      • Strict Adherence to Grammar: Every fix must be explicitly validated against the official Mermaid grammar. Never guess or assume.
      • Golden Tests: Crucially, implement golden tests (see below) that ensure fixed output is valid and matches expectations.

Testing & Verification

Testing the rule engine involves multiple layers:

  1. Unit Tests for Individual Rules:

    • Verify check() method correctly identifies issues and produces diagnostics for various valid and invalid inputs.
    • Verify apply_fix() method correctly transforms the AST and returns true when changes are made, and false otherwise.
    • Test idempotence: Applying apply_fix twice should only result in changes on the first application.
    • Test supports_fix_mode() behavior.

    Example test structure for missing_graph_decl.rs:

    // src/rules/missing_graph_decl.rs (within a #[cfg(test)] mod)
    #[cfg(test)]
    mod tests {
        use super::*;
        use crate::lexer::Lexer;
        use crate::parser::Parser;
        use crate::ast::{MermaidAst, Statement, GraphDeclaration, GraphOrientation};
        use crate::span::Span;
    
        fn parse_and_get_ast(input: &str) -> MermaidAst {
            let lexer = Lexer::new(input);
            let tokens = lexer.collect();
            let mut parser = Parser::new(tokens);
            parser.parse().expect("Failed to parse test input")
        }
    
        #[test]
        fn test_missing_graph_decl_check_warns() {
            let ast = parse_and_get_ast("A --> B");
            let mut diagnostics = Diagnostics::new();
            MissingGraphDeclarationRule.check(&ast, &mut diagnostics);
            assert_eq!(diagnostics.len(), 1);
            assert_eq!(diagnostics.get(0).unwrap().code, "R701_MISSING_GRAPH_DECL");
            assert_eq!(diagnostics.get(0).unwrap().level, DiagnosticLevel::Warning);
        }
    
        #[test]
        fn test_missing_graph_decl_check_no_warn() {
            let ast = parse_and_get_ast("graph TD\nA --> B");
            let mut diagnostics = Diagnostics::new();
            MissingGraphDeclarationRule.check(&ast, &mut diagnostics);
            assert!(diagnostics.is_empty());
        }
    
        #[test]
        fn test_missing_graph_decl_apply_fix() {
            let mut ast = parse_and_get_ast("A --> B");
            let mut diagnostics = Diagnostics::new();
            let changed = MissingGraphDeclarationRule.apply_fix(&mut ast, &mut diagnostics);
    
            assert!(changed);
            assert_eq!(ast.statements.len(), 2);
            assert!(matches!(ast.statements[0], Statement::GraphDeclaration(_)));
            assert_eq!(diagnostics.len(), 1);
            assert_eq!(diagnostics.get(0).unwrap().code, "R701_MISSING_GRAPH_DECL_FIXED");
    
            // Test idempotence
            let changed_again = MissingGraphDeclarationRule.apply_fix(&mut ast, &mut diagnostics);
            assert!(!changed_again); // Should not change again
        }
    
        #[test]
        fn test_missing_graph_decl_supports_strict_fix() {
            let rule = MissingGraphDeclarationRule;
            assert!(rule.supports_fix_mode(FixMode::ApplyFixes));
            assert!(rule.supports_fix_mode(FixMode::StrictFixes));
        }
    }
    

    Similar tests would be written for arrow_normalization.rs.

  2. Integration Tests for RuleEngine:

    • Test run_lint with various inputs, ensuring all relevant rules produce diagnostics.
    • Test run_fix with inputs that require multiple passes, ensuring idempotence and correct final state.
    • Test run_fix in StrictFixes mode, verifying that only rules supporting this mode are applied.
    • Verify MAX_FIX_PASSES warning is generated if an infinite loop is simulated.
  3. Golden Tests:

    • These are critical for a linter/formatter. Create a directory of input.mmd files and corresponding expected_output.mmd files.
    • The test reads input.mmd, runs the RuleEngine in fix mode, then compares the resulting (formatted) AST output against expected_output.mmd. This will be fully implemented when the formatter is ready in Chapter 8.
    • Include tests for malformed inputs to ensure graceful error handling.
  4. Performance Benchmarks:

    • Use Rust’s criterion crate or similar benchmarking tools to measure the performance of run_lint and run_fix on large, complex Mermaid diagrams. This helps identify performance bottlenecks as rules are added.

Summary & Next Steps

In this chapter, we successfully designed and implemented the core Rule Engine for our Mermaid analysis tool. We established a flexible Rule trait, allowing us to define modular linting and fixing logic. We built the RuleEngine itself, capable of running in lint, fix, and strict modes, and integrated it into our main application flow. We also implemented two practical rules: MissingGraphDeclarationRule and ArrowNormalizationRule, demonstrating how to identify and deterministically fix common Mermaid issues.

The tool can now process Mermaid code, tokenize it, parse it into an AST, validate it, and apply rule-based checks and transformations. While the AST can be modified, we currently lack a robust way to convert the modified AST back into a clean, formatted Mermaid string.

The next crucial step, covered in Chapter 8: The Formatter: Pretty-Printing the AST, will be to implement a production-grade formatter that takes our (potentially modified) AST and converts it back into a human-readable and syntactically correct Mermaid diagram string, adhering to consistent style guidelines. This will complete our compiler-like pipeline, enabling our tool to act as a full-fledged linter and auto-formatter.