March 1, 2026 • 12 min read
Native Kubernetes ValidatingAdmissionPolicies Are Making Gatekeeper Optional
Native CEL-based ValidatingAdmissionPolicy can replace many basic Gatekeeper checks while reducing admission-path operational risk.
The first time I truly hated our admission controller, it wasn’t during a security review.
It was during an incident.
A routine hotfix was ready, the blast radius was growing, and the cluster kept rejecting pods with a message that sounded like a governance lecture. We weren’t blocked by compute, networking, or image pulls. We were blocked by policy. Specifically, by a validating webhook that had decided to become the slowest thing in our control plane.
That night I learned a simple lesson: every admission webhook you add is another distributed system in the request path of “kubectl apply.” When it fails, it doesn’t fail politely.
Kubernetes now has a native, CEL-based policy mechanism called ValidatingAdmissionPolicy. If you are still running OPA Gatekeeper or Kyverno primarily for basic pod security and labeling rules, you’re probably carrying operational debt you don’t need anymore. There are still reasons to keep those tools, but “enforce CostCenter labels” should not be one of them.
Why this problem matters when everything is on fire
Admission control sits on the critical path of every create and update.
If your policy enforcement is slow, flaky, or misconfigured, you don’t just lose “compliance.” You lose the ability to deploy, scale, and sometimes even heal the system. In high-pressure moments, the thing you intended to protect the cluster becomes the thing preventing recovery.
Engineers tend to frame policy choices as “security tooling.” In reality, admission control is part of the control plane’s reliability budget. The same way you treat etcd, the API server, and DNS as sacred, you should treat admission latency and failure modes as sacred too.
That framing changes what “good” looks like.
Good policy enforcement is boring. It is fast. It is predictable. It fails closed only when you actually want it to, and it fails open only when you explicitly decide it should.
The system context I keep seeing in real clusters
Let me assume a setup that will feel familiar if you run Kubernetes in anger:
You have multiple teams shipping workloads, not all of them Kubernetes-native. Some are still learning why a label matters. Some have a platform team managing guardrails. You probably have at least one compliance requirement that translates into “prove resources are tagged,” “block privileged pods,” and “constrain host access.”
You also have constraints that never make it into architecture diagrams:
- During incidents, you must be able to deploy quickly and predictably.
- The platform team cannot be the bottleneck for every new namespace or team onboarding.
- Developers will route around painful systems, sometimes by copying YAML they barely understand.
- The control plane has a finite latency budget, especially under churn from autoscaling.
Those constraints are exactly where the “heavyweight admission webhook” story starts to crack.
The naive approach: deploy Gatekeeper and write a couple of policies
A lot of teams start with Gatekeeper for reasonable reasons.
Gatekeeper is powerful, policy-as-code feels tidy, and Rego looks like the right long-term bet if you want expressive logic. You install it, you add a ConstraintTemplate, and suddenly you can enforce that every Pod has a CostCenter label and that containers don’t run as privileged.
In a calm environment, this works.
In production, the risks show up around the edges.
Where it breaks: webhooks don’t degrade gracefully
Validating webhooks are network calls from the API server to your controller. That means:
- If the controller is overloaded, admission slows down.
- If the controller is down, admission can fail, depending on your failure policy.
- If the API server can’t reach the webhook service due to DNS or network policy, admission can fail.
- If your policy evaluation is expensive, admission slows down.
During normal load you might never notice. During a spike in pod churn, you will.
The most frustrating part is the failure mode. It often looks like the cluster is “broken” even though it’s actually “policy is unavailable.” To the on-call engineer, it’s indistinguishable from a larger outage until you connect the dots.
The “learning curve tax” that doesn’t show up in a sprint plan
Gatekeeper also comes with a people cost.
Rego is a different mental model. It’s powerful, but it’s not the same as “read YAML, write YAML.” Many platform teams end up with a small priesthood of people who can safely modify the policies. Everyone else either avoids touching them or ships changes with a level of fear that is not healthy.
If your policy goals are basic, paying that tax forever is hard to justify.
A real failure pattern: when policy blocks scaling
I’ve seen variations of this incident enough times that it has become a smell.
Cluster starts to scale out, a node pool adds capacity, and workloads try to schedule. That creates a burst of API writes as controllers create and update objects. Suddenly:
- Pod creations slow down.
- HPA actions lag.
- Cluster autoscaler can’t make progress because scaling workloads can’t schedule or create pods.
- You see timeouts on create requests, and the system enters a self-inflicted spiral.
The root cause is usually not “autoscaler is broken.” It’s that admission is becoming the bottleneck. Sometimes it’s because the policy controller is running on the same nodes that are under pressure. Sometimes it’s because a new policy increased evaluation time. Sometimes it’s because a DNS issue made the webhook intermittently unreachable.
The key point is that the blast radius is massive. Admission is not an optional path.
This is the moment where a lot of teams ask, “Why is a label check allowed to take down a cluster?”
The turning point: policy should be part of Kubernetes, not bolted on
ValidatingAdmissionPolicy changes the shape of the problem.
Instead of calling out to a third-party webhook to evaluate policy, Kubernetes can evaluate policies natively using CEL, the Common Expression Language. CEL is used in multiple ecosystems for safe, predictable expression evaluation. Kubernetes uses it in other places, and now it’s available for admission validation.
This matters for three reasons.
First, you remove an entire network hop from the critical path. No service, no deployment, no webhook endpoints.
Second, you reduce operational surface area. Fewer controllers to manage, fewer upgrades, fewer compatibility edges.
Third, you improve predictability. Evaluation happens in-process. The failure modes are simpler, and latency is generally lower because you are not performing external calls.
This does not mean Gatekeeper is dead. It means the default should change. Start native. Add heavy tooling only when native cannot express what you need.
What ValidatingAdmissionPolicy is, in plain terms
A ValidatingAdmissionPolicy is a policy object that includes:
- a match constraint, which defines which requests it applies to
- one or more validations expressed in CEL
- optional variables and message expressions
- an optional binding that controls enforcement and scoping
Think of it like a built-in “if this object matches, then these expressions must be true.”
You can write policies that enforce labels, reject privileged pods, block hostPath mounts, require resource requests, and more.
It’s not a mutation engine. It won’t auto-add labels. It will validate and reject.
That distinction matters. Enforcement is a guardrail, not a formatter.
Implementation walkthrough: enforce CostCenter and basic pod security with CEL
Let’s start with the kind of policy that gets teams to install Gatekeeper in the first place.
We want to enforce that any Pod created in most namespaces includes a CostCenter label on the Pod metadata. We also want to prevent privileged containers.
This is an example, not a universal policy. Adapt it to your environment.
CostCenter label enforcement
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: require-costcenter-label
spec:
matchConstraints:
resourceRules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["pods"]
validations:
- expression: "has(object.metadata.labels) && has(object.metadata.labels['CostCenter']) && size(object.metadata.labels['CostCenter']) > 0"
message: "Pod must include metadata.labels['CostCenter'] with a non-empty value."
The key part is the expression. object is the incoming resource. has(...) is how you avoid null issues. size(...) > 0 prevents empty strings.
Now you need a binding to control where and how it enforces.
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: require-costcenter-label-binding
spec:
policyName: require-costcenter-label
validationActions: ["Deny"]
matchResources:
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: NotIn
values: ["kube-system", "gatekeeper-system"]
This is where you tune blast radius. Exempt system namespaces and any namespaces you intentionally want to ignore.
You can also start with validationActions: ["Warn"] if you want a migration phase where developers see warnings but aren’t blocked.
Basic privileged container prevention
Now a policy to prevent privileged containers. This is intentionally narrow, because “pod security” can get complicated fast.
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: disallow-privileged
spec:
matchConstraints:
resourceRules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["pods"]
validations:
- expression: "!has(object.spec.containers) || object.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.privileged) || c.securityContext.privileged == false)"
message: "Privileged containers are not allowed."
This reads as: for all containers, privileged must be absent or false.
If you also use initContainers, you need to add similar checks. CEL makes this straightforward, but you must be explicit.
Validation: how I’d test this before letting it touch production
Admission policies should be tested like you test any critical control plane component.
At minimum:
- Create a pod without the label and confirm it is denied with the message you expect.
- Create a pod with the label and confirm it succeeds.
- Try a privileged pod and confirm denial.
- Confirm your exempt namespaces behave as expected.
- Run a “high churn” test in a staging cluster, such as a deployment scale up and down, and watch API server latency.
The last step is what most teams skip, and it’s where operational risk hides.
If you have API server audit logs and metrics, watch request latency and errors around admission. If you don’t, this is a good moment to add basic visibility. Admission should not be a black box.
Tradeoffs and alternatives considered
I’m opinionated here, but not blindly so. Let’s be precise about what you gain and what you lose.
What you gain with ValidatingAdmissionPolicy
Reliability and simplicity.
You remove a controller deployment, webhook configuration, service routing, certificates, and the ongoing churn of keeping it compatible with Kubernetes versions. You also remove the “webhook timeout broke deploys” class of incidents for policies that can be expressed in CEL.
Developer experience improves because CEL expressions are often easier to read than Rego for common validation checks. Not always, but often.
What you lose
Expressiveness and ecosystem tooling, depending on what you were using.
Gatekeeper and Kyverno offer richer policy libraries, more powerful constructs, and in Kyverno’s case, mutation and generation. If you rely heavily on those features, native validation will not replace the entire system.
Also, migration is not free. You have policies, documentation, and perhaps compliance mapping built around your existing tool. You need to treat that as a real project, not an afternoon YAML swap.
When I would still keep Gatekeeper or Kyverno
If you need mutation or generation, such as auto-injecting labels, defaulting fields, or generating NetworkPolicies per namespace, native validating policies won’t do that.
If you need complex cross-object logic that is hard to express without querying other resources, external engines may still be better. CEL is powerful, but Kubernetes is deliberately conservative in what admission policies can do safely.
If your organization depends on a mature set of community policies and auditing workflows from those ecosystems, you may keep them for standardization, at least for a while.
The point isn’t “delete everything third-party.” The point is “don’t use a bazooka to enforce a label.”
Production hardening and edge cases that matter
Start in warn mode for humans and in deny mode for robots
If you can, start policies in Warn mode for a sprint or two, then flip to Deny.
Warnings teach without blocking. They surface the hidden inventory of workloads that don’t meet the standard. They also let you see if your policy is too strict or just wrong.
Then, once you have buy-in and fixes are in progress, enforce.
Decide what “emergency deploy” means ahead of time
When you enforce admission policies, you’re making a promise: “We will not deploy non-compliant workloads, even in emergencies.”
If that is truly what you want, great. If not, you need an escape hatch that is explicit.
That might mean an exempt namespace for break-glass operations. It might mean a label-based bypass that only a small group can use. It might mean temporarily switching a policy binding to Warn during a declared incident.
Whatever you choose, design it deliberately. The worst escape hatch is the one you invent at 3 a.m.
Keep policy evaluation cheap
Even native evaluation can be expensive if you write pathological expressions. Avoid overly complex logic, deep traversals, or huge regex patterns unless you understand the cost.
Admission control is not the place for cleverness. It is the place for constraints that are easy to reason about.
Version and review policies like code
These are production-critical controls. Put them in Git. Require review. Add tests, even if those tests are “apply example manifests and check outcomes in a kind cluster.”
If you’re migrating from Gatekeeper, do the migration the same way. Translate policies, run parallel in warn mode, compare results, then cut over.
Lessons learned, the durable kind
The biggest mental shift for me was realizing that policy is not a separate governance layer. It is part of the deploy path. That makes it an uptime concern.
Here are the lessons I’d keep even if Kubernetes replaces CEL with something else in the future:
- If a control sits on the critical path, optimize for reliability before expressiveness.
- Complexity is not free. Every external dependency in admission control expands your failure modes.
- Most teams do not need a general-purpose policy language to enforce basic hygiene.
- “Security” tools can create availability incidents. Treat them like any other production component.
- Migration is a chance to simplify, not just to re-implement the same mess in a new syntax.
Closing reflection
I don’t regret using Gatekeeper. It helped us get serious about policy when we needed it.
What I regret is letting basic hygiene enforcement turn into a permanently running system that could block deploys, add latency, and become one more thing to babysit. We installed it for correctness and ended up paying for it in operability.
ValidatingAdmissionPolicy won’t solve every policy problem. It will solve a large and common subset in a way that is more aligned with how Kubernetes wants to work.
If you run Gatekeeper today, ask yourself a simple question: which of your constraints truly require a third-party engine, and which exist because “that’s what everyone installs”?
If you do the audit honestly, you might find a lot of policies that can move native, and a lot of operational risk you can delete.
Final takeaways
- Use ValidatingAdmissionPolicies for basic validation like labels, privileged flags, and simple pod security constraints.
- Treat admission control as part of control plane reliability, not just compliance.
- Migrate in phases: warn first, then deny, with explicit scoping and exemptions.
- Keep heavyweight policy engines only for cases where you need their unique capabilities.
- Test policies under churn, not just with a single dry-run manifest.