WAF Prep · LBG

⭐ Your Core STAR Stories (LBG Values)

Lead with the impact. Emphasize "I", not "We". Explicitly bridge to the WAF role at the end.

I/B
1. INCLUSIVE / BOLD: The BA & Exec Access Story (Allwyn) Q: Removed a barrier / Went the extra mile / Challenged authority.
S/T: BRDs from BAs were poor quality. Easy route was blaming them. I investigated the inputs instead.
A: Found a structural blocker: BAs (younger women/POC) couldn't get Exec time; PMs (older white men) could. I boldly escalated this access imbalance to the Execs to force delegation.
R: Execs delegated, BRD quality spiked, multiple workstreams sped up.
WAF Bridge: "If WAF rules are poorly configured, I don't blame the rule. I investigate the input—did the engineer have access? You fix the system, not the symptom."
P
2. PEOPLE-FIRST: Retail Team Bypassing Rules Q: Difficult stakeholder / Pushback on controls / Balancing security.
S/T: Retail team trying to bypass compliance controls on machines due to commercial friction.
A: Didn't just say "no." Dug into *why*. Reviewed objectives, interpreted data, worked collaboratively to find a middle ground where their commercial goal proceeded securely.
R: Machines remained compliant, risk mitigated, business operated efficiently.
WAF Bridge: "This is WAF engineering. App teams want to deploy fast, security wants to block. My job is finding the safe middle ground so the business moves securely."
T
3. TRUST: William Hill Assurance Delegation Q: Empowered a team / Scaled operations / Stepped back.
S/T: Pulled into multiple SME projects, couldn't personally execute all my sub-team's reviews.
A: Hired independent thinkers. Wrote the risk matrix, built procedures, trained them. Held 1-on-1s but provided coaching/frameworks, not direct answers. Trusted them to execute.
R: Team outperformed others, completing more work in less time. Model was adopted elsewhere.
WAF Bridge: "This is how you scale infra. Write the runbooks, build escalation matrices, train engineers, and trust them on-call. Doing it yourself creates a single point of failure."
M
4. TRUST (Mistake/Problem Solving): Gemini API Key Leak in Loki Q: Critical mistake / Technical problem solving / Fixing a live error.
S/T: Running my LLM platform, shipping logs via Alloy to Grafana Loki. Discovered `httpx` was logging full request URLs at INFO level, meaning the live Gemini API key (passed as a `?key=` query parameter) was bleeding into centralized logs on every call.
A: Didn't just filter the log (a band-aid). Fixed the root cause: moved the key from the query string to the `x-goog-api-key` header so it structurally couldn't appear in URLs. Added defense-in-depth by suppressing `httpx` INFO logs across all worker entrypoints. Shipped the fix through the GitLab CI pipeline rather than a manual hotfix.
R: Leak immediately stopped, verified in Grafana. This log-suppression and header-auth pattern became standard for all future API integrations.
WAF Bridge: "This is why WAF engineers care about HTTP structure. Query strings are logged by edge routers and WAFs by default; headers aren't. Fixing it at the protocol level rather than just writing a log filter is how you prevent the leak permanently."
S
5. SUSTAINABLE: PokerStars 5,000 Banners Q: Process improvement / Scaling / Working around blockers.
S/T: 5,000 affiliate banners non-compliant. Manual email process was failing.
A: Investigated backend, found templates held centrally. Proposed code replacement. Commercial pushed back on "boring" banners, so I collaborated with localization to make them compliant AND commercial. Handed to engineering.
R: All 5,000 replaced via one code change. Zero regulatory action.
WAF Bridge: "This is policy-as-code. Instead of manually updating 5,000 WAF rules, you change the template they inherit from. Fix at the root to eliminate human error."
BACKUPS (If they ask for a second example) Extra Mile/Empower: Allwyn BRD Framework (wrote acceptance criteria so engineers could self-serve compliance safely).
Difficult Person: PokerStars Commercial Pushback (finding middle ground on the banners).
Mistake: Contabo Monitoring (deployed logging but forgot alerts—deploying WAF in detect mode is useless if nobody looks at the alerts).
Improve Process: LCCP Reg Filter (GitHub tool replacing manual reg searching).

Why This Role Fits You

When they ask "why LBG?" and "why this role?" — here's what's genuine, not generic.

You've been the regulator — now you want to be the builder At the Gambling Commission you enforced security controls on others. You saw what good and bad looks like. Now you want to be on the inside, implementing the controls rather than auditing them.
Regulated infrastructure is where your brain works You've spent your career in compliance. You understand that the 12-month log retention policy isn't bureaucracy, it's the reason customers trust LBG with their money.
The tech stack matches what you already love Terraform, GCP, CI/CD pipelines, observability, IaC. The learning curve is the WAF-specific domain knowledge, not the tooling. And you've already started that curve (Cloud Armor).

The Panel & Questions to Ask Them

Andy Shephard
Lead Infrastructure Engineer (Hiring Manager)
Technical deep-dive, IaC, operations.
Ask Him: "What is the biggest operational headache the WAF team is facing right now that you're hoping this hire will take off your plate?"
Danny Cox
Interviewer
Operational risk, customer impact.
Ask Him: "When a new WAF ruleset rolls out, what does the staging process look like to guarantee zero customer disruption?"
Peter Needham
Interviewer
Governance, auditability, deep technical probing.
Ask Him: "How much friction are you seeing between strict policy-as-code requirements and the deployment speed expected by SRE teams during emergencies?"
🛡️
🌐 Client DNS resolves to WAF IP 🛡 WAF / Edge TLS terminates here Inspects: headers · URI · body Applies rule engine 403 BLOCKED 🖥 Origin Re-encrypted traffic forwarded 200 OK response ALLOW BLOCK DNS lookup HTTPS req

A WAF (Web Application Firewall) is a bouncer at the club door. Every single request to your website or API passes through it before reaching your backend servers. The bouncer checks each person against a rulebook and either lets them through or throws them out.

The WAF operates as a reverse proxy. The flow:

1
DNS: Client resolves hostname — points to the WAF's IP. Client thinks it's talking to the origin.
2
TLS Termination: WAF holds the TLS certificate and private key. It terminates the encrypted session, decrypts the payload so it can inspect it.
3
Inspection: Rule engine evaluates HTTP method, headers, URI path, and body against rules.
4
Forward or Block: ALLOW → re-encrypt, forward to origin. BLOCK → return 403 or redirect to error page.
Key concept: SNI (Server Name Indication)Tells a multi-tenant WAF which certificate to serve for that hostname. SNI mismatch = TLS handshake failure.
Say this"The WAF operates as a reverse proxy. DNS points to the WAF edge. The WAF terminates TLS — so it holds the cert and private key, decrypts the payload to inspect it, applies the rule engine against headers, URI, and body, then re-encrypts and forwards clean traffic to origin. The key subtlety is SNI — when multiple domains share an edge, the TLS ClientHello must include the correct SNI or the handshake fails."
TermDefinition
Reverse ProxyA server that sits in front of web servers and forwards client requests to them.
TLS TerminationThe process of decrypting HTTPS traffic at the proxy/WAF level rather than the final destination server.
SNIExtension to TLS allowing a client to specify which hostname it is connecting to.
🔒

DNS is the address book of the internet. TLS is the padlock on the envelope. HTTP/S is the letter inside. The WAF reads all of these to decide if the request is legitimate.

DNS & TLS in WAF context

  • A/CNAME records point to the WAF IP, not origin.
  • Edge CDN WAFs use Anycast routing.
  • TLS termination: WAF decrypts inbound, re-encrypts outbound. Holds private key.
  • Cipher suites: Negotiate encryption algorithm. Modern: TLS 1.3, ECDHE.
  • mTLS: Backend connection where origin also presents a cert.
On HTTP/S, DNS, TLS"WAF sits between DNS and origin. DNS resolves the hostname to the WAF's IP. TLS terminates at the WAF: the WAF holds the private key and decrypts the payload so the rule engine can inspect it. After inspection, the WAF re-encrypts and forwards clean traffic, optionally using mTLS to authenticate the WAF→origin channel."
TermDefinition
AnycastRouting method where multiple servers share the same IP address; routes to closest server.
TTL (Time To Live)Setting that tells DNS resolver how long to cache a query.
🏗️

LBG runs Cloudflare/Akamai at the Edge, F5/Imperva in their Data Centers, and GCP Cloud Armor natively in the cloud. You have hands-on with Cloudflare and Cloud Armor.

TypeVendorsYour position
Edge / SaaSCloudflare, Akamai✓ Cloudflare in production
On-premF5 BIG-IP, ImpervaNo hands-on, frame as transferable IaC
Cloud-nativeGCP Cloud Armor, AWS WAF✓ Cloud Armor badge (Friday)
Say this"I know LBG operates a hybrid model. My strongest operational background is at the Edge running Cloudflare in production, and I've augmented that by completing the GCP Cloud Armor badge this week to ensure I'm sharp on the cloud-native side. While I haven't clicked around an F5 GUI, the Terraform provider abstractions and policy-as-code principles transfer directly across all three."
🏗️

Terraform keeps a record (the state file) of exactly what it built. If someone goes around you and clicks in the console during an emergency, Terraform notices — it sees the live environment differs from its code. That gap is called Configuration Drift.

If you don't fix the drift, the next automated pipeline will overwrite your emergency fix and bring the attack right back.

Handling the 3 AM Emergency (Drift Management)

  • The Break-Glass Action: During a live incident, speed wins. You log into the WAF console and manually block the IP. You do not wait for a CI/CD pipeline when data is at risk.
  • The Drift: Real infra no longer matches Code or State.
  • The Clean-Up: Once mitigated, backport the fix. Write the code, run terraform plan (should show 0 changes), and merge. Or use terraform import if you created a completely new resource manually.
  • Prevention: Run terraform plan on a schedule in GitHub Actions. If it detects drift, it alerts the team so manual changes aren't forgotten.
When asked: "Do you fix an active attack in code or the console?""In a P1 incident, speed is the priority. I will make a manual 'break-glass' change in the WAF console to stop the bleeding and protect the bank. However, that immediately creates Configuration Drift. The incident is NOT closed when the attack stops; the incident is only closed when I have backported that manual fix into Terraform, run a plan, and merged it. If I don't, the next person's pipeline deployment will overwrite my emergency fix. To catch this, I also run scheduled 'terraform plan' jobs to alert the team if un-codified drift exists."
⚙️

Every WAF rule change goes through a pipeline — like a spell-checker that runs automatically. Policy-as-code is the spell-checker.

  • Terraform plan: Pipeline runs terraform plan and outputs to JSON.
  • Policy check: OPA (Open Policy Agent) evaluates JSON against rules (e.g., "TLS minimum version must be 1.2").
  • Safe Rule Deployment: Never deploy directly into Block mode. Deploy in Log-Only mode → Monitor for false positives → Flip to Block.
Say this"WAF rules live in Git alongside the Terraform config. A PR triggers the pipeline — Terraform plan runs, the output JSON is fed to OPA for policy evaluation. If it violates a baseline, the pipeline fails before any human reviews it. Policy-as-code gives you an immutable, version-controlled audit trail."
💥

Layer 7 attacks are smarter. They send real, valid-looking website requests. The WAF has to figure out the difference between 10,000 real customers and 10,000 bots all doing the same thing.

Say this"Layer 7 is fundamentally harder because the traffic looks like legitimate requests. A botnet sending 100,000 HTTP GETs looks identical to real customers — until your database falls over. The defence has multiple layers: rate limiting per session identity not just IP, bot fingerprinting, and positive security schema validation."
🤖

Bot protection is about telling the difference between a real browser and an automated script trying to brute force passwords (credential stuffing).

Say this"Bot protection isn't just about blocking IP addresses anymore because attackers use residential proxies. You evaluate signals: JS fingerprinting, JA3 TLS fingerprints, and behavioral ML analysis. The main use cases we care about at a bank are credential stuffing and card testing."
🎯

The ones most relevant for WAF configuration are BOLA (stealing other people's data by guessing IDs), broken authentication (credential stuffing), and resource exhaustion (L7 DDoS).

Say this"The critical API threats the WAF handles are BOLA and Resource Consumption. For BOLA, the WAF is the first line of defense detecting mass enumeration. We also ensure SSRF vectors are blocked by rejecting requests where parameters attempt to hit internal RFC 1918 addresses."
☁️

Cloud Armor is Google's WAF. You write security policies and attach them to your backend services. Rules run in priority order: first match wins.

How to raise this"I completed the GCP Cloud Armor Skills Badge on Friday, which felt directly relevant given LBG's GCP partnership. Cloud Armor's model — priority-ordered security policies attached to backend services — is a clean example of cloud-native WAF done well."
🔍

Tuning a WAF is a permanent balancing act. Too tight = customers can't use your service. Too loose = attackers get through. Make surgical fixes, not blunt rule removals.

Say this"False positive investigation: pull the WAF access logs, find the blocked request and its rule ID. Get a HAR file from the client. Identify what triggered it. Then write a surgical exclusion scoped to that specific URI and parameter combination — never disable the rule globally."
📊

If you can't see what your WAF is doing, you're flying blind. Good observability answers: Is it blocking the right things? Is it affecting real customers? How much latency is it adding?

Monitoring question answer"The key WAF signals are: 5xx error rate (is the WAF blocking too much?), P99 latency (is it adding overhead?), and block rate trend (sudden drop means WAF may be broken). Alert on symptoms, not volumes. My own setup reduced time-to-notice on production failures to under a minute using Grafana Cloud and alerts."
🚨

It's never just "find a solution and do it". It's "If I do this, what changes? Can it wait? Does it need to happen now?"

A knee-jerk fix often causes a bigger outage than the actual attack. You have to mitigate the immediate threat, check the dependencies, look for hidden footprints, and then apply a permanent fix.

The "If I do this, then what?" Framework

1
Mitigation vs. Remediation: Does this need to happen right now? If data is bleeding, yes. Apply a targeted mitigation (like rate-limiting) to stop the bleeding without breaking the whole application.
2
Dependency & Footprint Check: If the mitigation is fine, what else changed? I assume they tried more than one door. Go into Grafana/Loki: Did this IP hit other endpoints? Are there hidden footprints?
3
The Final Fix: Before writing the permanent Terraform block rule, what are the dependencies? Who owns this app? Will locking down this endpoint break their CI/CD pipeline tomorrow?
When asked about troubleshooting / incident response"My approach to incident response is that it's rarely just 'find a solution and do it.' Every security control has a blast radius. I always ask: If I apply this WAF rule, what legitimate traffic changes? Can a permanent fix wait for a staging deployment, or does it need a break-glass mitigation right now? You have to mitigate the threat first, but you also have to look for the secondary footprints—like checking Loki for compromised dependencies or other endpoints that IP touched—before you consider the incident closed. You fix the system, not just the symptom."