Supertrace

LLMs Don't Understand BGP. Here's What It Takes to Change That.

Ask a general-purpose LLM to diagnose a BGP route leak and it will give you an answer, confidently, clearly, and almost certainly wrong.

It will tell you to check your route maps. It will suggest filters that sound plausible but reference non-existent community strings. It will explain BGP concepts accurately in isolation while completely misunderstanding how they interact in your specific topology. The explanation reads like a textbook, but the recommendation would break your network.

This is not a criticism of LLMs so much as a statement about what they were trained on and what they were not. General-purpose language models understand BGP the way someone who has read every RFC and never touched a router understands BGP: they know the vocabulary and can recite the theory, but they lack the operational intuition that comes from watching real routing tables shift under real conditions.

For network operators evaluating AI tools, this distinction matters enormously, because the gap between "can discuss BGP" and "can reason about BGP in production" is where incidents go unresolved and trust gets destroyed.

Why BGP Is Uniquely Hard for Language Models

BGP (Border Gateway Protocol) is the routing protocol that holds the internet together. Every ISP, every cloud provider, every enterprise with multiple upstream connections relies on it to exchange reachability information and make forwarding decisions. When BGP works correctly, traffic finds the right path, and when it does not, entire regions can go dark.

What makes BGP uniquely hard for AI is not its complexity in isolation, but the combination of four properties that rarely appear together in the training data LLMs learn from.

First, BGP is stateful across time. A routing decision at 2:15pm depends on what happened at 2:14pm. A flapping peer that resets every 90 seconds creates a different failure mode than a peer that went down once and stayed down. Context is not optional; it is the entire problem.

Second, BGP behavior is topology-dependent. The same AS path means different things depending on where you observe it. A route that looks normal from one vantage point is a leak from another. You cannot reason about BGP without a model of the network it runs on.

Third, BGP is policy-driven. Every network applies its own route maps, communities, local preferences, and import/export filters. Two networks with identical physical topologies can have completely different routing behavior because of policy. This policy is rarely documented in any format an LLM would have seen during training.

Fourth, BGP failures are often partial and ambiguous. A route leak does not always cause a hard outage. Sometimes it causes slightly elevated latency on a subset of paths. Sometimes it manifests as asymmetric routing that only affects certain traffic types. The signal is subtle, and interpreting it requires combining BGP state with traffic flow data, interface counters, and sometimes even customer complaints.

General-purpose LLMs have none of this context. They have never seen your routing table, your topology, your policies, or your traffic patterns. When they answer BGP questions, they are pattern-matching against documentation and forum posts, not reasoning about a live network.

The failure modes are predictable once you understand what the model is actually doing.

When asked about a BGP issue, a general-purpose LLM will typically produce a response that is structurally correct but operationally useless. It will identify that a route leak could cause the symptom you describe, suggest checking prefix filters, and recommend verifying AS path lengths, all of which is accurate in a generic sense and none of which tells you what is actually happening on your network right now.

Worse, the model will sometimes hallucinate specific details. It might reference a community value that does not exist in your policy. It might suggest a fix that assumes a topology you do not have. It might recommend a change that would fix the immediate symptom while creating a routing loop elsewhere.

The fundamental issue is that BGP troubleshooting is not a knowledge retrieval problem but a reasoning problem that requires live state, historical context, and topological awareness. An LLM that lacks all three is essentially guessing in well-formatted English.

For a network engineer, this is worse than no answer at all, because a confident wrong answer wastes time. You investigate the suggested cause, rule it out, and then start your actual diagnosis from scratch, having burned 15 minutes on a dead end that sounded credible.

Making LLMs useful for real network protocols is not a prompting problem. You cannot solve it with better instructions or retrieval-augmented generation alone, and it requires fundamental changes to how the AI system is built.

The first requirement is live network state. The model needs access to actual routing tables, BGP peer status, prefix advertisements, and policy configurations as they exist right now, not as they were documented six months ago. This means deep integration with the network itself: the ability to query routers, parse show commands, and maintain an up-to-date view of the control plane.

The second requirement is topological reasoning. The AI needs a model of how devices connect, how traffic flows, and how a change in one place propagates to others. This is not something you can inject through a prompt, and it requires purpose-built data structures that represent network topology and allow the model to trace paths, identify shared dependencies, and predict cascade effects.

The third requirement is temporal memory. Network incidents unfold over time. A BGP flap three hours ago might be related to the latency spike you are seeing now. A maintenance window last week might have left a stale route that only becomes a problem under load. The system needs to remember what happened, when, and be able to correlate events across time windows that span minutes to months.

The fourth requirement is policy awareness. Every network has its own routing policy, and that policy determines what "correct" behavior looks like. An AI that does not know your policies cannot distinguish a legitimate route from a leak, because the difference is defined by your intent, not by the protocol itself.

The fifth requirement is action safety. Even if the AI correctly identifies the root cause, the remediation it suggests must be safe. In networking, a wrong fix is often worse than no fix. Any system that recommends BGP changes needs to reason about the blast radius: what else would be affected? What could go wrong? Is this reversible?

Why This Matters for the Industry

This list of requirements explains why "just add AI to your NOC" is not a meaningful product strategy, and why wrapping a general-purpose LLM in a network-themed interface produces demos that look impressive and production results that are dangerous.

The engineers building AI for network operations face a harder problem than their counterparts in application-layer AI. Application observability benefits from structured data, well-defined schemas, and failure modes that are relatively deterministic. Network operations involves unstructured signals, vendor-diverse hardware, partially observable state, and failure modes that require physics-aware reasoning.

None of this is a reason to dismiss AI for networking, but it is a reason to build it differently.

The teams that will earn trust with network operators are the ones that start from the network up, not from the LLM down. That means deep integration with real gear, persistent memory of past incidents and resolutions, topological reasoning that is built in rather than bolted on, and a fundamental commitment to safe, supervised assistance rather than unsupervised action.

Supertrace AI Blog Images (6) (1).png

At Supertrace, this is the problem we work on every day. Not making LLMs sound like they understand BGP concepts (that part is easy) but making them actually useful when a network engineer needs to diagnose a route leak at 2am with incomplete information and a ticking clock.

The core engineering challenge is what we call the context problem. The models we work with are, in many ways, brilliant network engineers who know nothing about your network. They understand the protocols, the failure modes, the theory. What they lack is everything that makes your network yours: the topology, the device configs, the recent alerts, the carrier notifications, the trace results, the postmortems from last quarter.

Our job is to gather all of that context, serialize it in a way the model can reason over, and deliver it within the right time window so the AI is working with the same situational awareness a senior engineer would have if they had been watching the network all week. That means building a layer of primitives (topology maps, device metadata, alert timelines, read-only SSH access, historical incident data) that gives the model a complete picture of what is happening right now and what led up to it. When you get that right, the model stops guessing and starts helping you resolve incidents at a pace that manual workflows simply cannot match.

The gap between "can discuss the protocol" and "can reason about your network" is real, and closing it requires building AI systems that are purpose-built for the operational reality of running physical infrastructure. Systems that talk to your gear, remember your incidents, understand your topology, and respect the complexity of the protocols that hold the internet together.

BGP has been running the internet for three decades without AI assistance. If AI is going to earn a seat in the NOC, it needs to meet the protocol on its own terms. Anything less is a liability dressed up as a feature.

LLMs Don't Understand BGP. Here's What It Takes to Change That.

LLMs Don't Understand BGP. Here's What It Takes to Change That.

Why BGP Is Uniquely Hard for Language Models

Why This Matters for the Industry

Transform your network operations