Why your WAF's ML model is probably making you less secure

Web Application Firewall vendors have been slapping "AI-powered" onto their products for the better part of a decade. Most of them mean one of two things: a rules engine with a better UI, or a classifier trained on a benchmark dataset that hasn't changed since 2010.

Neither is inherently bad. But neither is what they claim it is.

This is the first post in a series on building WAF intelligence that actually holds up in production. Before we get to how to do it right, we need to be honest about the failure modes — because most teams deploy a naive ML solution, see 95% accuracy on their test set, call it done, and end up in a worse position than they started.

The benchmark trap

The most widely cited WAF training dataset is CSIC-2010 — a synthetic dataset generated by a Spanish research council, containing about 36,000 normal requests and 25,000 malicious ones.

The dataset was created in 2010. It targets a single fictional e-commerce application. Every SQL injection payload in it was generated by the same automated tool.

If you train a classifier on CSIC-2010 and test it on CSIC-2010, you will get accuracy north of 99%. You will also have built a model that has learned to recognize one tool's output against one application's URL structure — and nothing else.

Real traffic is different in every way that matters:

URL structures vary wildly across applications
Payloads are obfuscated, fragmented across parameters, encoded in a dozen ways
Attackers update their tooling faster than you retrain
Normal traffic looks suspicious to models that have never seen your application

Why class imbalance kills WAF models in practice

A WAF that blocks 0.1% of legitimate traffic is unacceptable. In high-traffic applications, that's thousands of legitimate users per day hitting a block page.

Standard ML training doesn't account for this asymmetry. If you train on balanced classes (50% malicious, 50% benign), your model is calibrated for a threat environment that doesn't exist. Real traffic is overwhelmingly benign — often 99.9%+ depending on the application.

When you deploy a model trained on balanced classes to real traffic, two things happen:

False positive rate explodes — the model has never seen what "mostly normal traffic" looks like at scale, so it over-fires on unusual-but-legitimate patterns
False negative rate is misleadingly low — because you're measuring it against a balanced test set, not against real attack distributions

The fix isn't complicated — it requires training on data that reflects your actual traffic distribution — but almost nobody does it because collecting and labeling real traffic data is hard.

The adversarial drift problem

The deeper issue is one that applies to all security ML but hits WAF especially hard: adversarial drift.

A spam filter degrades gracefully over time — spammers adapt, accuracy falls, you retrain. The cost of a missed spam is low.

A WAF operates against motivated adversaries with tight feedback loops. An attacker probing your application will notice within minutes if a payload is blocked, adjust the encoding, and try again. Your model's training data doesn't include the adjusted payload.

This is fundamentally different from most ML problems. You're not fighting concept drift driven by natural distribution shift. You're fighting an active adversary who is specifically trying to find payloads your model classifies as benign.

Standard accuracy metrics don't capture this at all.

In the next post, we'll cover what a proper data collection strategy looks like — how to get labeled traffic that reflects your actual application, how to handle the imbalance problem, and how to structure your labeling pipeline to keep up with adversarial evolution.

The goal isn't to make a WAF that gets 99% on a benchmark. It's to make one where the attacker's cost of finding an evasion is higher than the value of the target.