WAF (Web Application Firewall) Interview Prep

⭐ Your Core STAR Stories (LBG Values)

Lead with the impact. Emphasize "I", not "We". Explicitly bridge to the WAF role at the end.

I/B

1. INCLUSIVE / BOLD: The BA & Exec Access Story (Allwyn) Q: Removed a barrier / Went the extra mile / Challenged authority.
S/T: BRDs from BAs were poor quality. Easy route was blaming them. I investigated the inputs instead.
A: Found a structural blocker: BAs (younger women/POC) couldn't get Exec time; PMs (older white men) could. I boldly escalated this access imbalance to the Execs to force delegation.
R: Execs delegated, BRD quality spiked, multiple workstreams sped up.
WAF Bridge: "If WAF rules are poorly configured, I don't blame the rule. I investigate the input—did the engineer have access? You fix the system, not the symptom."

2. PEOPLE-FIRST: Retail Team Bypassing Rules Q: Difficult stakeholder / Pushback on controls / Balancing security.
S/T: Retail team trying to bypass compliance controls on machines due to commercial friction.
A: Didn't just say "no." Dug into *why*. Reviewed objectives, interpreted data, worked collaboratively to find a middle ground where their commercial goal proceeded securely.
R: Machines remained compliant, risk mitigated, business operated efficiently.
WAF Bridge: "This is WAF engineering. App teams want to deploy fast, security wants to block. My job is finding the safe middle ground so the business moves securely."

3. TRUST: William Hill Assurance Delegation Q: Empowered a team / Scaled operations / Stepped back.
S/T: Pulled into multiple SME projects, couldn't personally execute all my sub-team's reviews.
A: Hired independent thinkers. Wrote the risk matrix, built procedures, trained them. Held 1-on-1s but provided coaching/frameworks, not direct answers. Trusted them to execute.
R: Team outperformed others, completing more work in less time. Model was adopted elsewhere.
WAF Bridge: "This is how you scale infra. Write the runbooks, build escalation matrices, train engineers, and trust them on-call. Doing it yourself creates a single point of failure."

4. TRUST (Mistake/Problem Solving): Gemini API Key Leak in Loki Q: Critical mistake / Technical problem solving / Fixing a live error.
S/T: Running my LLM platform, shipping logs via Alloy to Grafana Loki. Discovered `httpx` was logging full request URLs at INFO level, meaning the live Gemini API key (passed as a `?key=` query parameter) was bleeding into centralized logs on every call.
A: Didn't just filter the log (a band-aid). Fixed the root cause: moved the key from the query string to the `x-goog-api-key` header so it structurally couldn't appear in URLs. Added defense-in-depth by suppressing `httpx` INFO logs across all worker entrypoints. Shipped the fix through the GitLab CI pipeline rather than a manual hotfix.
R: Leak immediately stopped, verified in Grafana. This log-suppression and header-auth pattern became standard for all future API integrations.
WAF Bridge: "This is why WAF engineers care about HTTP structure. Query strings are logged by edge routers and WAFs by default; headers aren't. Fixing it at the protocol level rather than just writing a log filter is how you prevent the leak permanently."

5. SUSTAINABLE: PokerStars 5,000 Banners Q: Process improvement / Scaling / Working around blockers.
S/T: 5,000 affiliate banners non-compliant. Manual email process was failing.
A: Investigated backend, found templates held centrally. Proposed code replacement. Commercial pushed back on "boring" banners, so I collaborated with localization to make them compliant AND commercial. Handed to engineering.
R: All 5,000 replaced via one code change. Zero regulatory action.
WAF Bridge: "This is policy-as-code. Instead of manually updating 5,000 WAF rules, you change the template they inherit from. Fix at the root to eliminate human error."

BACKUPS (If they ask for a second example) • Extra Mile/Empower: Allwyn BRD Framework (wrote acceptance criteria so engineers could self-serve compliance safely).
• Difficult Person: PokerStars Commercial Pushback (finding middle ground on the banners).
• Mistake: Contabo Monitoring (deployed logging but forgot alerts—deploying WAF in detect mode is useless if nobody looks at the alerts).
• Improve Process: LCCP Reg Filter (GitHub tool replacing manual reg searching).

Why This Role Fits You

When they ask "why LBG?" and "why this role?" — here's what's genuine, not generic.

You've been the regulator — now you want to be the builder At the Gambling Commission you enforced security controls on others. You saw what good and bad looks like. Now you want to be on the inside, implementing the controls rather than auditing them.

Regulated infrastructure is where your brain works You've spent your career in compliance. You understand that the 12-month log retention policy isn't bureaucracy, it's the reason customers trust LBG with their money.

The tech stack matches what you already love Terraform, GCP, CI/CD pipelines, observability, IaC. The learning curve is the WAF-specific domain knowledge, not the tooling. And you've already started that curve (Cloud Armor).

The Panel & Questions to Ask Them

Andy Shephard

Lead Infrastructure Engineer (Hiring Manager)

Technical deep-dive, IaC, operations.
Ask Him: "What is the biggest operational headache the WAF team is facing right now that you're hoping this hire will take off your plate?"

Danny Cox

Interviewer

Operational risk, customer impact.
Ask Him: "When a new WAF ruleset rolls out, what does the staging process look like to guarantee zero customer disruption?"

Peter Needham

Interviewer

Governance, auditability, deep technical probing.
Ask Him: "How much friction are you seeing between strict policy-as-code requirements and the deployment speed expected by SRE teams during emergencies?"

🛡️

WAF Request Flow

How all traffic passes through the WAF — the end-to-end picture

Very likely askedDiagram

▼

A WAF (Web Application Firewall) is a bouncer at the club door. Every single request to your website or API passes through it before reaching your backend servers. The bouncer checks each person against a rulebook and either lets them through or throws them out.

The WAF operates as a reverse proxy. The flow:

DNS: Client resolves hostname — points to the WAF's IP. Client thinks it's talking to the origin.

TLS Termination: WAF holds the TLS certificate and private key. It terminates the encrypted session, decrypts the payload so it can inspect it.

Inspection: Rule engine evaluates HTTP method, headers, URI path, and body against rules.

Forward or Block: ALLOW → re-encrypt, forward to origin. BLOCK → return 403 or redirect to error page.

Key concept: SNI (Server Name Indication)Tells a multi-tenant WAF which certificate to serve for that hostname. SNI mismatch = TLS handshake failure.

Say this"The WAF operates as a reverse proxy. DNS points to the WAF edge. The WAF terminates TLS — so it holds the cert and private key, decrypts the payload to inspect it, applies the rule engine against headers, URI, and body, then re-encrypts and forwards clean traffic to origin. The key subtlety is SNI — when multiple domains share an edge, the TLS ClientHello must include the correct SNI or the handshake fails."

Term	Definition
Reverse Proxy	A server that sits in front of web servers and forwards client requests to them.
TLS Termination	The process of decrypting HTTPS traffic at the proxy/WAF level rather than the final destination server.
SNI	Extension to TLS allowing a client to specify which hostname it is connecting to.

🔒

HTTP/S · DNS · TLS Fundamentals

The protocol knowledge Shephard will probe

Definitely asked

▼

DNS is the address book of the internet. TLS is the padlock on the envelope. HTTP/S is the letter inside. The WAF reads all of these to decide if the request is legitimate.

DNS & TLS in WAF context

A/CNAME records point to the WAF IP, not origin.
Edge CDN WAFs use Anycast routing.
TLS termination: WAF decrypts inbound, re-encrypts outbound. Holds private key.
Cipher suites: Negotiate encryption algorithm. Modern: TLS 1.3, ECDHE.
mTLS: Backend connection where origin also presents a cert.

On HTTP/S, DNS, TLS"WAF sits between DNS and origin. DNS resolves the hostname to the WAF's IP. TLS terminates at the WAF: the WAF holds the private key and decrypts the payload so the rule engine can inspect it. After inspection, the WAF re-encrypts and forwards clean traffic, optionally using mTLS to authenticate the WAF→origin channel."

Term	Definition
Anycast	Routing method where multiple servers share the same IP address; routes to closest server.
TTL (Time To Live)	Setting that tells DNS resolver how long to cache a query.

🏗️

WAF Vendor Landscape

Edge, on-prem appliance, cloud-native — and what LBG runs

Know this coldLBG-specific

▼

LBG runs Cloudflare/Akamai at the Edge, F5/Imperva in their Data Centers, and GCP Cloud Armor natively in the cloud. You have hands-on with Cloudflare and Cloud Armor.

Type	Vendors	Your position
Edge / SaaS	Cloudflare, Akamai	✓ Cloudflare in production
On-prem	F5 BIG-IP, Imperva	No hands-on, frame as transferable IaC
Cloud-native	GCP Cloud Armor, AWS WAF	✓ Cloud Armor badge (Friday)

Say this"I know LBG operates a hybrid model. My strongest operational background is at the Edge running Cloudflare in production, and I've augmented that by completing the GCP Cloud Armor badge this week to ensure I'm sharp on the cloud-native side. While I haven't clicked around an F5 GUI, the Terraform provider abstractions and policy-as-code principles transfer directly across all three."

🏗️

Terraform & Configuration Drift

State management, the "3 AM Emergency" fix, and Drift

Peter Needham will ask this

▼

Terraform keeps a record (the state file) of exactly what it built. If someone goes around you and clicks in the console during an emergency, Terraform notices — it sees the live environment differs from its code. That gap is called Configuration Drift.

If you don't fix the drift, the next automated pipeline will overwrite your emergency fix and bring the attack right back.

Handling the 3 AM Emergency (Drift Management)

The Break-Glass Action: During a live incident, speed wins. You log into the WAF console and manually block the IP. You do not wait for a CI/CD pipeline when data is at risk.
The Drift: Real infra no longer matches Code or State.
The Clean-Up: Once mitigated, backport the fix. Write the code, run terraform plan (should show 0 changes), and merge. Or use terraform import if you created a completely new resource manually.
Prevention: Run terraform plan on a schedule in GitHub Actions. If it detects drift, it alerts the team so manual changes aren't forgotten.

When asked: "Do you fix an active attack in code or the console?""In a P1 incident, speed is the priority. I will make a manual 'break-glass' change in the WAF console to stop the bleeding and protect the bank. However, that immediately creates Configuration Drift. The incident is NOT closed when the attack stops; the incident is only closed when I have backported that manual fix into Terraform, run a plan, and merged it. If I don't, the next person's pipeline deployment will overwrite my emergency fix. To catch this, I also run scheduled 'terraform plan' jobs to alert the team if un-codified drift exists."

⚙️

CI/CD + Policy-as-Code

How WAF rules go from commit to production safely

Definitely asked

▼

Every WAF rule change goes through a pipeline — like a spell-checker that runs automatically. Policy-as-code is the spell-checker.

Terraform plan: Pipeline runs terraform plan and outputs to JSON.
Policy check: OPA (Open Policy Agent) evaluates JSON against rules (e.g., "TLS minimum version must be 1.2").
Safe Rule Deployment: Never deploy directly into Block mode. Deploy in Log-Only mode → Monitor for false positives → Flip to Block.

Say this"WAF rules live in Git alongside the Terraform config. A PR triggers the pipeline — Terraform plan runs, the output JSON is fed to OPA for policy evaluation. If it violates a baseline, the pipeline fails before any human reviews it. Policy-as-code gives you an immutable, version-controlled audit trail."

💥

Layer 7 DDoS & Application Attacks

Why L7 is harder than L3/4 — and how to defend

▼

Layer 7 attacks are smarter. They send real, valid-looking website requests. The WAF has to figure out the difference between 10,000 real customers and 10,000 bots all doing the same thing.

Say this"Layer 7 is fundamentally harder because the traffic looks like legitimate requests. A botnet sending 100,000 HTTP GETs looks identical to real customers — until your database falls over. The defence has multiple layers: rate limiting per session identity not just IP, bot fingerprinting, and positive security schema validation."

🤖

Bot Protection

Distinguishing legitimate automation from malicious bots

▼

Bot protection is about telling the difference between a real browser and an automated script trying to brute force passwords (credential stuffing).

Say this"Bot protection isn't just about blocking IP addresses anymore because attackers use residential proxies. You evaluate signals: JS fingerprinting, JA3 TLS fingerprints, and behavioral ML analysis. The main use cases we care about at a bank are credential stuffing and card testing."

🎯

OWASP API Security Top 10

The attack types the WAF needs to defend against

▼

The ones most relevant for WAF configuration are BOLA (stealing other people's data by guessing IDs), broken authentication (credential stuffing), and resource exhaustion (L7 DDoS).

Say this"The critical API threats the WAF handles are BOLA and Resource Consumption. For BOLA, the WAF is the first line of defense detecting mass enumeration. We also ensure SSRF vectors are blocked by rejecting requests where parameters attempt to hit internal RFC 1918 addresses."

☁️

GCP Cloud Armor ⭐

Google's WAF — you did the badge Friday — lean into this

Big advantage

▼

Cloud Armor is Google's WAF. You write security policies and attach them to your backend services. Rules run in priority order: first match wins.

How to raise this"I completed the GCP Cloud Armor Skills Badge on Friday, which felt directly relevant given LBG's GCP partnership. Cloud Armor's model — priority-ordered security policies attached to backend services — is a clean example of cloud-native WAF done well."

🔍

False Positive / Negative Triage

The ongoing balancing act — and how to diagnose methodically

▼

Tuning a WAF is a permanent balancing act. Too tight = customers can't use your service. Too loose = attackers get through. Make surgical fixes, not blunt rule removals.

Say this"False positive investigation: pull the WAF access logs, find the blocked request and its rule ID. Get a HAR file from the client. Identify what triggered it. Then write a surgical exclusion scoped to that specific URI and parameter combination — never disable the rule globally."

📊

Observability & Telemetry

What to monitor, how to alert, LBG's actual stack

▼

If you can't see what your WAF is doing, you're flying blind. Good observability answers: Is it blocking the right things? Is it affecting real customers? How much latency is it adding?

Monitoring question answer"The key WAF signals are: 5xx error rate (is the WAF blocking too much?), P99 latency (is it adding overhead?), and block rate trend (sudden drop means WAF may be broken). Alert on symptoms, not volumes. My own setup reduced time-to-notice on production failures to under a minute using Grafana Cloud and alerts."

🚨

Incident Response & Blast Radius

The "If I do this, then what?" supply-chain mindset

Compliance mindset applied to tech

▼

It's never just "find a solution and do it". It's "If I do this, what changes? Can it wait? Does it need to happen now?"

A knee-jerk fix often causes a bigger outage than the actual attack. You have to mitigate the immediate threat, check the dependencies, look for hidden footprints, and then apply a permanent fix.

The "If I do this, then what?" Framework

Mitigation vs. Remediation: Does this need to happen right now? If data is bleeding, yes. Apply a targeted mitigation (like rate-limiting) to stop the bleeding without breaking the whole application.

Dependency & Footprint Check: If the mitigation is fine, what else changed? I assume they tried more than one door. Go into Grafana/Loki: Did this IP hit other endpoints? Are there hidden footprints?

The Final Fix: Before writing the permanent Terraform block rule, what are the dependencies? Who owns this app? Will locking down this endpoint break their CI/CD pipeline tomorrow?

When asked about troubleshooting / incident response"My approach to incident response is that it's rarely just 'find a solution and do it.' Every security control has a blast radius. I always ask: If I apply this WAF rule, what legitimate traffic changes? Can a permanent fix wait for a staging deployment, or does it need a break-glass mitigation right now? You have to mitigate the threat first, but you also have to look for the secondary footprints—like checking Loki for compromised dependencies or other endpoints that IP touched—before you consider the incident closed. You fix the system, not just the symptom."

LBG (Lloyds Banking Group) Infrastructure Engineer (WAF)

⭐ Your Core STAR Stories (LBG Values)

Why This Role Fits You

The Panel & Questions to Ask Them

WAF Request Flow

HTTP/S · DNS · TLS Fundamentals

DNS & TLS in WAF context

WAF Vendor Landscape

Terraform & Configuration Drift

Handling the 3 AM Emergency (Drift Management)

CI/CD + Policy-as-Code

Layer 7 DDoS & Application Attacks

Bot Protection

OWASP API Security Top 10

GCP Cloud Armor ⭐

False Positive / Negative Triage

Observability & Telemetry

Incident Response & Blast Radius

The "If I do this, then what?" Framework