v1.25.0

Upgrade Notes

v1.25.0 raises Fission’s minimum Kubernetes version to 1.32, tightens two more admission rules around the HTTPTrigger path and the PodSpec capability allowlist, and moves several validation checks from the admission webhook to API-server CEL. Specs that rely on the rejected primitives — or clusters older than 1.32 — will fail admission after upgrade. Audit the items below before rolling out.

For the general upgrade steps (CRDs, CLI, Helm chart), see the Upgrade Guide. The breaking changes specific to v1.25.0 and the action each requires are below.

Minimum Kubernetes is now 1.32

The Helm chart’s kubeVersion is >=1.32.0-0, the envtest harness pulls 1.32.x assets, and the runtime health-check floor (MinimumKubernetesVersion) is 1.32. Clusters older than 1.32 are no longer supported.

As part of this, the fluentbit PodSecurityPolicy manifest and the logger.podSecurityPolicy Helm values are removed — PodSecurityPolicy was removed from Kubernetes in 1.25 and cannot exist on a 1.32+ cluster. If you relied on logger.podSecurityPolicy, use Pod Security Admission / Pod Security Standards instead.

HTTPTrigger path safety enforced at admission

HTTPTrigger.spec.relativeurl / spec.prefix are now validated by both the API server (CEL) and the Go-side Validate() path that the CLI and router status conditions use. This closes GHSA-vchh-r53j-8mpw: an HTTPTrigger written via kubectl apply or the Kubernetes REST API could previously specify an empty path, contain .. traversal, be root-only (/), collide with router-owned routes (/router-healthz, /readyz, /_version, /auth/login), or shadow the router-internal /fission-function/<ns>/<name> prefix — the fission CLI already rejected these but kubectl bypassed them.

A trigger that violates any of these now fails admission. The fission CLI is unchanged. If you had any such triggers in your specs, fix the paths before upgrading.

PodSpec capabilities are an allowlist (not a denylist)

Environment and Function PodSpec admission and the merge-layer sanitizer switch from a six-entry denylist to an allowlist that matches Kubernetes Pod Security Admission’s restricted profile: only NET_BIND_SERVICE may appear in capabilities.add. The merge layer also forces capabilities.drop = ["ALL"] on every container — including containers whose source had no SecurityContext at all — so the OCI runtime’s default cap set (CHOWN/MKNOD/NET_RAW/SETUID/…) no longer reaches tenant containers.

This closes GHSA-qf5v-m7p4-95rp: the previous denylist was structurally incomplete — SYS_TIME (corrupts the shared, non-namespaced node wall clock), SYS_RAWIO, BPF, SYS_RESOURCE, and MAC_ADMIN all passed through, and the OCI default DAC_OVERRIDE reached containers regardless.

Any Environment/Function/Container spec that adds capabilities beyond NET_BIND_SERVICE is now rejected at admission. Tenants that silently relied on the OCI default cap set will also see those caps stripped at merge time. If a workload legitimately needed one of the dropped capabilities, run it outside Fission’s tenant-facing CRDs.

HTTPTrigger / TimeTrigger / CanaryConfig admission webhooks are gone

CRD field rules for these three resources now run as CEL x-kubernetes-validations in the API server itself. The dedicated admission webhooks are removed; the checks CEL cannot express (invalid cron schedules, CORS origin/max-age, ingress path regex) move into the router and timer reconcilers and surface as resource status conditions (RouteAdmitted=False, Scheduled=False) instead of admission rejections.

Behavioral change: a raw kubectl apply / GitOps write of an invalid cron, CORS origin, or ingress path is now admitted and flagged with a …=False condition rather than rejected at creation. The fission CLI still rejects these client-side, so the common path is unchanged. Operators relying on admission rejection of a malformed cron/CORS/ingress should check the new resource conditions instead.

The Function, Environment, Package, MessageQueueTrigger, and KubernetesWatchTrigger webhooks are retained — they enforce the checks CEL cannot express (cross-namespace references, PodSpec/container security, package archive size, message-queue type/topic validity).

Deprecations/Removals

  • Kubernetes < 1.32 is no longer supported (kubeVersion: ">=1.32.0-0" in the Helm chart).
  • logger.podSecurityPolicy Helm values and the fluentbit PodSecurityPolicy manifest are removed (PSP was removed in Kubernetes 1.25).
  • Admission webhooks for HTTPTrigger, TimeTrigger, and CanaryConfig are dropped — replaced by API-server CEL plus reconciler-written status conditions.
  • The fission-bundle binary no longer pulls in github.com/aws/aws-sdk-go v1 or github.com/graymeta/stow: pkg/storagesvc now uses minio-go/v7 directly. ~3 MB of always-resident heap per subsystem is reclaimed; behaviour and Package URL ?id=… formats are preserved.

Highlights

  • Reconciler consolidation in the executor (RFC-0004, Implemented). The executor’s nine controller-runtime reconcilers collapse to three. A single Function-centric reconciler uses .For(Function) plus .Watches(...) to react to its real dependency graph — Environment, ConfigMap, Secret, and the Deployment/Service/HPA it manages — instead of three per-executor-type Function reconcilers, two Environment reconcilers, and two ConfigMap/Secret reconcilers each running their own predicates and goroutine pools. The redundant standalone Deployment/Service informer factory is retired (one informer infrastructure removed from every executor process), and reads route through the manager cache. No CRD, CLI, or Helm-value change.
  • Self-healing function workloads (RFC-0004). Because owner-reference garbage collection cannot cross namespaces — and Fission’s Deployment/Service/HPA often live in a FunctionNamespace distinct from the Function CR — the reconciler now .Watches() its managed objects via the existing function-identifying labels. A Deployment deleted out-of-band re-enqueues the owning Function and is recreated proactively (targeting MinScale), so MinScale>0 workloads stay warm instead of waiting for the next invocation. The request-path heal (GetFuncSvc → IsValid → re-specialize) remains the backstop; the periodic reaper becomes a long-tail backstop rather than the primary repair path.
  • Reliable cross-namespace function teardown (RFC-0004). A new fission.io/function-cleanup finalizer on the shared Function reconciler tears workloads down via the owning executor type’s DeleteFunction before the Function CR is collected, closing the long-standing leak where the executor could miss a delete event and orphan cross-namespace workloads. Gated by the chart-wide finalizerEnabled toggle (default on); flipping it off drains any existing finalizer safely, and the deletion teardown path is flag-independent so toggling never strands an object.
  • CRD field rules now run in the API server (RFC-0003). Validation that previously lived only in the admission webhook moves into x-kubernetes-validations (CEL) markers on the CRD types — executor type/strategy enums, autoscale bounds, archive/checksum/build-status enums, environment version range, pool-size and termination-grace bounds, HTTPTrigger HTTP-method enums, FunctionReference name (DNS-1123) validation, and Environment.spec.version immutability. The same change adds Server-Side Apply list markers, so kubectl apply --server-side, Argo CD, and Flux merge Fission resources without clobbering peer entries. Rules apply even when the admission webhook isn’t running, and kubectl explain surfaces them. The webhook is retained for the checks CEL cannot express (cross-namespace references, podspec/container security, the package archive literal-size limit) — see the related webhook removal under Deprecations.
  • Control-plane robustness for HA installs. Router, executor, and the singleton controllers (kubewatcher, timer, mqtrigger Kafka, mqt_keda, buildermgr, canaryconfigmgr) gain graceful-drain on shutdown via a fresh GRACEFUL_SHUTDOWN_TIMEOUT context, opt-in active-passive leader election via Kubernetes Lease (only the leader runs mutating controllers/reapers; standbys keep warm caches), /readyz gated on informer-cache sync (and leadership where applicable) so non-leaders stay out of Service endpoints, jittered retry backoff plus panic-recovery middleware on both router listeners, and removal of os.Exit(1) from router/executor/KEDA-scaler hot paths (degrade-and-retry instead of crashloop). Default single-replica installs behave identically — when leaderElection.enabled: false, the helper reports leader immediately.
  • Helm chart additions for HA. Templated replicas and opt-in leaderElection.enabled for every leader-electable controller; probe split (/readyz readiness, /healthz liveness); opt-in rollout strategy and terminationGracePeriodSeconds; opt-in podDisruptionBudget for router and executor; opt-in autoscaling (HorizontalPodAutoscaler) for the router; coordination.k8s.io/leases RBAC wired into every leader-electable role. All defaults preserve the prior 9-pod, single-replica topology.
  • Memory and metrics wins. Latency summaries (http_requests_duration_seconds and friends) are converted to Prometheus histograms; executor pod-cache maps are freed and the idle reaper’s concurrency is bounded; the zap base logger is memoized and webhook loggers are lazy-init. Composed with the aws-sdk-go v1 / graymeta/stow drop in pkg/storagesvc, ~6 MB of always-resident heap is reclaimed in every fission-bundle subsystem (pprof-confirmed against the kind-CI run).

Fixes

The five recurring runtime-error classes surfaced by CI log analysis are addressed under RFC-0006 — Runtime Error-Noise Reduction & Pod-Lifecycle Correctness (Implemented in #3468–#3473):

  • exec /bin/sleep preStop hooks are gone. The lifecycle.preStop hook on executor- and fetcher-managed pods used to invoke /bin/sleep via exec, which (a) fails on distroless images like chainguard/static that have no shell or sleep, (b) always exits 137 because the sleep runs the full grace window, and (c) is a wasted CRI round-trip at grace=0. The hook now uses Kubernetes’ native lifecycle.preStop.sleep action (GA since 1.30; our floor is 1.32) — the kubelet performs the sleep itself with no binary required, and grace=0 emits no hook at all.
  • Trigger events on transient router 404s are no longer dropped. pkg/publisher (kubewatcher / timer / mqtrigger) treated every 4xx from the router’s internal listener as terminal, so events delivered during the window between trigger creation and mux reconciliation were silently dropped. 404 now falls through to the existing bounded retry (10× / ~17 min worst case); other 4xx and 5xx semantics are unchanged. CI now provisions metrics-server in the kind cluster so HPA scaling is actually exercised end-to-end for the first time.
  • HPA actually scales newdeploy / container functions. Pod-wide Resource CPU/memory metrics require every container in the pod to declare a request, which Fission function pods rarely satisfy (function container has no CPU request unless the user or environment sets one, sidecars can lack them too). KCM logged missing request for cpu and HPA scaling silently never worked. The executor now rewrites those to ContainerResource metrics scoped to the function’s main container (GA since Kubernetes 1.30) and reconciles drift on existing HPAs, both on create and on fn update.
  • Deletion races no longer log as errors. Pool destroy on an already-deleted Deployment, fsvc not found in cache on function delete, setInitialBuildStatus on a deleted Package, and router tap of an expired fsvc are now NotFound-tolerant and stay out of the error log.
  • Specialize-vs-delete race and finalizer write races closed in the executor, and Function status writes retry on Conflict to absorb concurrent reconciles.
  • Zip-slip in archive extraction is closed (CWE-22). pkg/utils zip extraction and the fetcher/builder shared-volume FS operations are confined with os.Root helpers.
  • AdoptExistingResources hardened. The executor now has update on services in its RBAC, and the adopt/cleanup path is resilient to concurrent re-stamping with serial-adopt test coverage.
  • Routine security sweeps refreshed Go dependencies (Kubernetes 0.36, controller-runtime 0.24, KEDA 2.20, OTel, sarama, minio-go, x-image) and bumped integration-test kind images to v1.32.11 / v1.34.3 / v1.35.1.
  • Helm chart published as 1.25.0, versioned independently from the app version.

Changelog

What’s Changed

Full Changelog: https://github.com/fission/fission/compare/v1.24.0...v1.25.0

References

Last modified June 5, 2026: Doc changes v1.25.0 (#287) (08ca74e)