← All posts

April 30, 2026 13 min read Juliet Security Team

We Tested Copy Fail in Kubernetes: PSS Restricted and RuntimeDefault Did Not Block AF_ALG

Q: Does Pod Security Standards Restricted stop it?

No, not in our tests. PSS Restricted admitted a non-root RuntimeDefault pod that could create AF_ALG sockets and bind the relevant AEAD algorithm path.

Q: What should I block?

For a seccomp compensating control, deny socket() when arg0 is AF_ALG (38). Validate the actual node-local profile and test runtime behavior. Patch the kernel as the primary fix.

kubernetes-security linux-kernel seccomp runtime-security cve-2026-31431 pod-security-standards

On April 22, 2026, the Linux CNA published CVE-2026-31431, a Linux kernel vulnerability in algif_aead, the AEAD side of the kernel's AF_ALG crypto socket interface. Xint named the bug Copy Fail and showed how page-cache bytes for a read-only file can be changed without dirtying the file on disk.

That Linux framing is accurate, but it leaves a Kubernetes question unanswered:

If a pod is non-root, has all Linux capabilities dropped, uses RuntimeDefault seccomp, and is admitted under Pod Security Standards Restricted, can it still reach the Copy Fail kernel path?

We tested that on two real Kubernetes clusters:

Talos v1.12.2, kernel 6.18.5-talos, containerd 2.1.6
EKS on Amazon Linux 2023.11, kernel 6.12.79-101.147.amzn2023.x86_64, containerd 2.2.1

The short answer: yes. In both clusters, a non-root pod admitted under PSS Restricted could create an AF_ALG socket and bind the relevant AEAD algorithm path. In both clusters, a non-root pod could change cached bytes for a file baked into a purpose-built container image layer, and another pod from the same image on the same node observed the changed bytes. RuntimeDefault did not block the path. A custom Localhost seccomp profile denying socket(AF_ALG, ...) did.

We also ran controlled root-chain labs on both Talos/containerd and EKS/containerd with a purpose-built setuid helper inside a test image. Those labs reached container euid 0 when the consuming pod allowed privilege escalation. A PSS Restricted writer pod could mutate the shared image-layer bytes, but allowPrivilegeEscalation: false prevented that pod from using the setuid handoff itself.

This post is deliberately scoped. We are not publishing exploit code. We did not target host setuid binaries, host executables, package-managed files, or production application files. The setuid test used only a purpose-built helper inside a disposable lab image.

Key findings

PSS Restricted did not block the relevant AF_ALG path in either tested cluster.
RuntimeDefault seccomp did not block AF_ALG on Talos/containerd or EKS/containerd.
A custom Localhost seccomp profile denying socket(AF_ALG, ...) blocked the path in both clusters.
Cross-pod page-cache visibility was reproducible on the same node, including with a shared container image layer.
Controlled Talos and EKS labs reached container euid 0 when a pod with allowPrivilegeEscalation: true executed a mutated purpose-built setuid helper from a shared image layer.
PSS Restricted did not stop the page-cache mutation, but allowPrivilegeEscalation: false did stop that setuid-helper path from becoming root inside the restricted pod.
On-disk bytes stayed clean in the hostPath lab while normal cached reads observed changed bytes.

What Copy Fail is

The official CVE title is crypto: algif_aead - Revert to operating out-of-place. The CVE record assigns CVSS 3.1 score 7.8 High with vector AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H. The upstream affected range starts at Linux commit 72548b093ee38a6d4f2a19e6ef1948ae05c181f7; the CVE record lists fixes at fafe0fa2995a0f7073c1c358d7d3145bcc9aedd8, ce42ee423e58dffa5ec03524054c9d8bfd4f6237, and a664bf3d603dc3bdcf9ae47cc21e0daec706d7a5.

Xint's writeup explains the primitive: an unprivileged local process can use AF_ALG AEAD operations and splice() in a way that causes kernel writes into file-backed page-cache pages. Those pages are used by normal reads and execution paths, but the file is not dirtied on disk in the usual way.

That distinction matters in Kubernetes because containers on the same node share the host kernel and the host page cache. Kubernetes namespaces isolate many things. They do not give each pod a private Linux kernel.

Why Kubernetes changes the impact

Kubernetes teams usually reason about this kind of bug through workload posture:

Is the pod root or non-root?
Is it privileged?
Does it have dangerous Linux capabilities?
Does it use host namespaces or hostPath?
Is the namespace enforcing Pod Security Standards Restricted?
Is seccomp enabled?

Those questions are still useful, but Copy Fail cuts across one of the common assumptions: RuntimeDefault seccomp is not the same thing as denying AF_ALG.

Kubernetes defines RuntimeDefault as the container runtime's default seccomp profile. That profile varies by runtime and release. In the profiles we checked, Docker/Moby and containerd deny socket(AF_VSOCK, ...), but they do not deny socket(AF_ALG, ...). AF_ALG is address family 38; AF_VSOCK is 40.

That means a pod can look reasonable by most Kubernetes posture standards and still reach the kernel interface needed for this vulnerability.

What we tested

We ran five defensive lab tests on both clusters, then ran controlled root-chain labs on both Talos/containerd and EKS/containerd.

1. Does RuntimeDefault allow AF_ALG?

On both clusters, a non-root pod with all capabilities dropped and seccompProfile.type: RuntimeDefault successfully created an AF_ALG socket.

Talos result:

AF_ALG_SOCKET_OK
AF_ALG_AUTHENCESN_BIND_OK

EKS result:

AF_ALG_SOCKET_OK
AF_ALG_AUTHENCESN_BIND_OK

That does not prove exploitation by itself. It proves the relevant syscall path was reachable from a normal pod.

2. Does PSS Restricted block it?

No. In both clusters, a namespace with pod-security.kubernetes.io/enforce=restricted admitted a non-root RuntimeDefault pod that could create AF_ALG sockets and bind the relevant AEAD algorithm.

Both clusters returned:

PSS_RESTRICTED_AFALG_OK

This is not a Kubernetes bug. PSS Restricted is a workload hardening baseline. It does not promise to deny every kernel attack surface.

The important takeaway is simpler: do not tell yourself "PSS Restricted" means "AF_ALG is blocked."

3. Can another pod observe page-cache changes?

Yes, for a shared hostPath inode in our lab.

On Talos, a writer pod in one namespace changed a throwaway file's cached bytes at offset 64 from CROS to TLXN. A reader pod in a different namespace observed the changed bytes:

CROSSNS_WRITER_BEFORE 43524f53 b'CROS'
CROSSNS_WRITER_AFTER  544c584e b'TLXN'
CROSSNS_READER_OFFSET_64 544c584e b'TLXN'

On EKS, the same cross-namespace test changed CROS to EKXN:

CROSSNS_WRITER_BEFORE 43524f53 b'CROS'
CROSSNS_WRITER_AFTER  454b584e b'EKXN'
CROSSNS_READER_OFFSET_64 454b584e b'EKXN'

The namespaces did not matter because the cache is per node, not per namespace.

4. Does the effect outlive the attacking pod?

Yes, in our lab. After deleting the writer and reader Jobs, a later reader Job scheduled on the same node still observed the changed cached bytes.

Talos:

LIFECYCLE_READER_OFFSET_64 544c584e b'TLXN'

EKS:

LIFECYCLE_READER_OFFSET_64 454b584e b'EKXN'

This is not persistence on disk. It is persistence in the node's page cache. Rebooting or otherwise evicting the relevant pages clears that class of effect, but pod deletion alone did not.

5. Does this require hostPath?

This was the test we cared about most. HostPath is already a high-risk pattern, and a Kubernetes audience would be right to ask whether this is just "hostPath is dangerous" with extra steps.

We built a purpose-specific lab image with a read-only file at /copyfail-lab/target.bin. Pod A ran from that image and changed cached bytes for the file. Pod B ran from the same image on the same node and read the same path.

On Talos, Pod A changed IMG0 to TIMG, and Pod B saw TIMG:

IMAGE_WRITER_BEFORE 494d4730 b'IMG0'
IMAGE_WRITER_AFTER  54494d47 b'TIMG'
IMAGE_READER_OFFSET_64 54494d47 b'TIMG'
IMAGE_EXPECTED_SHA256 915a6e4d52cf856e62c67cdb4e453c785c3ec6515e51dee3fd3f95e1c9d9f03a
IMAGE_READER_SHA256   a18219a47a679b294ed62a16bd6a9c8b050799c5a282f3e650b63745970fa8ac

On EKS, Pod A changed IMG0 to EIMG, and Pod B saw EIMG:

IMAGE_WRITER_BEFORE 494d4730 b'IMG0'
IMAGE_WRITER_AFTER  45494d47 b'EIMG'
IMAGE_READER_OFFSET_64 45494d47 b'EIMG'
IMAGE_EXPECTED_SHA256 915a6e4d52cf856e62c67cdb4e453c785c3ec6515e51dee3fd3f95e1c9d9f03a
IMAGE_READER_SHA256   f4b6127700e72fe2c7f9293a427e7966bd66572df3a16a505364f7bd1a1448a5

That is the Kubernetes-relevant finding: in our Talos/containerd and EKS/containerd labs, this was not limited to hostPath. A file baked into a container image layer was enough for cross-pod visibility on the same node.

Do not overgeneralize this beyond what we tested. Snapshotter behavior, storage drivers, image garbage collection, page-cache eviction, and runtime configuration can change the details. But the claim "this only matters if you mount hostPath" is not true in the clusters we tested.

Can this become root in Kubernetes?

Yes, with important boundaries.

After the image-layer test, we built second disposable lab images for Talos and EKS. They contained a purpose-built setuid-root helper whose only privileged behavior was writing a marker inside its own container when a four-byte marker in its own image-layer file had changed. The helpers did not target host files, package-managed files, or real application binaries.

In a non-root pod with all Linux capabilities dropped, RuntimeDefault seccomp, and allowPrivilegeEscalation: true, the helper denied access before mutation and then reached euid 0 after the cached image-layer bytes changed:

COPYFAIL_LAB_DENY found=0 ruid=1000 euid_start=0 euid_now=1000
COPYFAIL_WRITE ... before=4a4c5430 after=4a4c5431
COPYFAIL_LAB_ALLOW found=1 regain=0 marker_fd=3 ruid=1000 euid_start=0 euid_now=0

We reproduced the same sequence on both Talos/containerd and EKS/containerd. That proves a full container-root chain in our labs when a reachable setuid target exists and the consuming pod allows privilege escalation.

PSS Restricted changes that part of the chain. We ran the same mutated helper under allowPrivilegeEscalation: false, which sets no_new_privs. The mutated bytes were visible, but the helper started without setuid elevation and could not regain euid 0:

COPYFAIL_LAB_ALLOW found=1 regain=-1 marker_fd=-1 errno=1 ruid=1000 euid_start=1000 euid_now=1000

The Kubernetes-specific result is the cross-pod version, which also reproduced on both clusters. A PSS Restricted writer pod changed cached bytes for the helper inside a shared image layer. A separate pod from the same image on the same node, with allowPrivilegeEscalation: true, then observed the changed image-layer bytes and reached euid 0:

WRITER: COPYFAIL_WRITE ... before=4a4c4330 after=4a4c4331
WRITER: COPYFAIL_LAB_ALLOW found=1 regain=-1 marker_fd=-1 euid_start=1000 euid_now=1000
READER_JLC0_OFFSET -1 READER_JLC1_OFFSET 8192
READER: COPYFAIL_LAB_ALLOW found=1 regain=0 marker_fd=3 ruid=1000 euid_start=0 euid_now=0

What this means: PSS Restricted did not prevent a pod from changing shared image-layer page-cache bytes, and a different pod with a reachable setuid target and privilege escalation enabled could consume that changed cache state.

What this does not mean: we did not prove host-root compromise, container escape, or that every PSS Restricted workload can become root through this exact setuid path. The root-chain result depends on a reachable target that can turn modified bytes into privileged execution.

Disk stayed clean while cached reads changed

For the hostPath lab file, we compared normal cached reads with O_DIRECT reads.

On Talos, normal reads saw JLT!, but direct reads saw the original 0123 and original SHA-256:

NORMAL_OFFSET_64 4a4c5421 b'JLT!'
NORMAL_SHA256 40879caad4634328849e0405d26e23f506ea54c86c3a8176a036c3acb4a9e39a
DIRECT_OFFSET_64 30313233 b'0123'
DIRECT_SHA256 d281ea21d2bc15d0f737288a081f5982ad6e08d836af73ca013ab00e469cf27f

On EKS, normal reads saw EKS!, but direct reads saw the original 0123 and original SHA-256:

NORMAL_OFFSET_64 454b5321 b'EKS!'
NORMAL_SHA256 0e33005673886ecf2d6f7782dbcfca3f7ef29d09411ca4a925a6e8b83b15bb4d
DIRECT_OFFSET_64 30313233 b'0123'
DIRECT_SHA256 d281ea21d2bc15d0f737288a081f5982ad6e08d836af73ca013ab00e469cf27f

That supports Xint's core observation: this class of corruption can affect what normal file reads see without changing persistent file content. It also explains why on-disk-only integrity checks can be the wrong control for this bug class.

Careful wording matters here. We are not saying every file-integrity product fails in every configuration. We are saying that if a tool only verifies persistent on-disk bytes, it can miss a page-cache-only change that normal reads still observe.

The mitigation we verified

Kernel patching is the primary fix. Red Hat, Ubuntu, Debian, and other distributions should be tracked through their own advisory and package channels because vendor kernels often backport fixes without matching upstream version numbers.

As a compensating control, we tested a Localhost seccomp profile that denies socket() when the first argument is AF_ALG (38). On both Talos and EKS, that blocked the path with EPERM and left a fresh target unchanged.

Talos:

AF_ALG_SOCKET_BLOCKED errno=1 strerror=Operation not permitted
MITIGATED_BEFORE 61626364 b'abcd'
MITIGATED_AFTER  61626364 b'abcd'

EKS:

AF_ALG_SOCKET_BLOCKED errno=1 strerror=Operation not permitted
MITIGATED_BEFORE 61626364 b'abcd'
MITIGATED_AFTER  61626364 b'abcd'

We also reran the Talos setuid-helper lab under the same AF_ALG-denying Localhost profile. The writer failed at socket(AF_ALG, ...), the fresh helper marker stayed unchanged, and the helper did not reach the mutated path:

PermissionError: [Errno 1] Operation not permitted
AFTER_JLB0_OFFSET 8192 AFTER_JLB1_OFFSET -1
COPYFAIL_LAB_DENY found=0 ruid=1000 euid_start=0 euid_now=1000

Kubernetes does not inline seccomp syscall filters in pod YAML. A Localhost profile has to exist on the node under the kubelet seccomp profile path, and pod specs refer to it by name. That means a remediation plan has two parts:

Put the AF_ALG-denying profile on every node that needs it.
Enforce that untrusted workloads use that profile until the kernel is patched.

Do not treat the word Localhost as magic. You need to verify the actual profile content.

What Juliet detects today

Juliet now has an initial Copy Fail scanner, plus a policy gate for teams that want to enforce the temporary seccomp mitigation while node kernels are being patched.

The scanner joins three signals:

Node KBOM facts: osImage, kernelVersion, and containerRuntimeVersion.
Workload posture: each container's effective seccomp profile after pod-level inheritance.
Node-local seccomp profile content: Juliet's node agent hashes and parses kubelet Localhost profile JSON and verifies whether the referenced profile denies socket(AF_ALG, ...).

Juliet opens a high-severity Issue when a pod is scheduled on a Copy Fail affected or unknown node and at least one container does not use a node-verified Localhost profile that denies AF_ALG. RuntimeDefault, unset, Unconfined, non-Localhost, missing profile content, parse failures, and Localhost profiles that do not prove the deny all remain exposed.

Want Juliet to check your clusters for this? Start Juliet free, connect a cluster, then open Security → All Findings and search for Copy Fail or CVE-2026-31431. Existing users can also ask Explorer: which pods are exposed to Copy Fail?

We also added a high-severity built-in policy named copyfail-require-custom-seccomp. It is disabled by default and intended for customers who want to audit or enforce this specific mitigation. The policy flags containers, init containers, and ephemeral containers whose effective seccomp profile is unset, Unconfined, RuntimeDefault, or any non-Localhost value.

One important constraint: Juliet does not claim full node CVE fixed status from kernel strings alone. Kernel version matching is risky because vendors backport fixes. The scanner marks the kernels we reproduced in lab as affected, honors vendor-not-affected cases we have validated, and otherwise keeps the node in unknown until vendor package/advisory proof exists.

What Kubernetes teams should do now

1. Patch nodes first

Treat the node kernel as the source of truth. Use your OS vendor's advisory and package channel, not only upstream semantic kernel versions.

Red Hat rates the issue Important and lists RHEL 8, RHEL 9, RHEL 10, and RHEL 8/9 kernel-rt streams as affected in its tracker. Ubuntu marks the issue High. Debian's tracker shows package-specific status across suites. Those statuses can change as packages ship, so automate against vendor data where possible.

2. Do not assume RuntimeDefault blocks AF_ALG

We tested containerd defaults on Talos and EKS and found AF_ALG reachable. The current Moby and containerd default profiles we reviewed do not deny AF_ALG either.

If your temporary mitigation depends on seccomp, test the actual runtime behavior on each node family.

3. Use a targeted Localhost seccomp profile for untrusted workloads

Blocking all socket() calls will break workloads. The narrow control is to deny socket() only when the address family is AF_ALG.

Apply that profile to namespaces that run untrusted code, CI jobs, build runners, multi-tenant workloads, or anything that executes customer-supplied plugins.

4. Reduce node-sharing risk

The image-layer result matters because many pods on a node can share the same lower-layer file. Until patched, reduce unnecessary co-location between high-risk and high-value workloads:

Isolate CI/build workloads onto dedicated nodes.
Avoid mixing multi-tenant workloads with control-plane-adjacent or privileged workloads.
Reduce privileged pods, host namespaces, and broad hostPath mounts.
Consider sandboxed runtimes, but test them. Neither of our clusters had gVisor, Kata, or Firecracker RuntimeClasses available, so we are not making a sandbox-runtime claim here.

5. Treat suspected nodes as suspect

Page-cache-only corruption is transient, but a process that exploited it may have used the resulting access to make persistent changes elsewhere. If you suspect exploitation, patch or replace the node and investigate the workloads that ran there.

What We Did and Did Not Prove

There is enough here without exaggeration.

We did prove a controlled container-root chain on Talos/containerd and EKS/containerd when a pod with allowPrivilegeEscalation: true consumed a mutated purpose-built setuid helper from a shared image layer.

Do not claim every Kubernetes cluster is vulnerable. The result depends on the running kernel, vendor patches, runtime configuration, and workload policy.

Do not claim container escape or host-root compromise is guaranteed. This is a powerful local kernel primitive, but node compromise depends on accessible target files, namespaces, mounts, workload behavior, and what the attacker can cause another process to read or execute.

Do not claim PSS Restricted alone makes this harmless. In our lab, PSS Restricted stopped the setuid handoff inside the restricted pod, but it did not stop that pod from mutating shared image-layer cache bytes that a different pod later consumed.

Do not claim file-integrity tools always fail. Say the precise thing: on-disk-only checks can miss page-cache-only changes.

Do not claim sandboxed runtimes mitigate this until tested. They may change the risk model, but we did not have a sandbox RuntimeClass in either tested cluster.

FAQ

Is this a Kubernetes vulnerability?

No. CVE-2026-31431 is a Linux kernel vulnerability. Kubernetes matters because pods on a node share the host kernel and, in the scenarios we tested, page-cache effects were visible across pods on the same node.

Does Pod Security Standards Restricted stop it?

No, not in our tests. PSS Restricted admitted a non-root RuntimeDefault pod that could create AF_ALG sockets and bind the relevant AEAD algorithm path.

Does RuntimeDefault seccomp stop it?

No, not in the Talos/containerd and EKS/containerd clusters we tested. RuntimeDefault is runtime-defined. Common runtime defaults we checked did not deny AF_ALG.

Can this become root?

In our Talos and EKS labs, yes for container-root under specific conditions: a reachable setuid target in a purpose-built image layer and a consuming pod with allowPrivilegeEscalation: true. We did not prove host-root compromise or container escape, and allowPrivilegeEscalation: false prevented that setuid handoff in the restricted pods we tested.

Does this require hostPath?

No, not in our tests. We reproduced cross-pod visibility using a purpose-built container image layer. HostPath was useful for proving the disk-vs-page-cache behavior with O_DIRECT, but it was not required for the image-layer result.

What should I block?

For a seccomp compensating control, deny socket() when arg0 is AF_ALG (38). Validate the actual node-local profile and test runtime behavior. Patch the kernel as the primary fix.

We Tested Copy Fail in Kubernetes: PSS Restricted and RuntimeDefault Did Not Block AF_ALG

Key findings

What Copy Fail is

Why Kubernetes changes the impact

What we tested

1. Does RuntimeDefault allow AF_ALG?

2. Does PSS Restricted block it?

3. Can another pod observe page-cache changes?

4. Does the effect outlive the attacking pod?

5. Does this require hostPath?

Can this become root in Kubernetes?

Disk stayed clean while cached reads changed

The mitigation we verified

What Juliet detects today

What Kubernetes teams should do now

1. Patch nodes first

2. Do not assume RuntimeDefault blocks AF_ALG

3. Use a targeted Localhost seccomp profile for untrusted workloads

4. Reduce node-sharing risk

5. Treat suspected nodes as suspect

What We Did and Did Not Prove

FAQ

Is this a Kubernetes vulnerability?

Does Pod Security Standards Restricted stop it?

Does RuntimeDefault seccomp stop it?

Can this become root?

Does this require hostPath?

What should I block?

Sources

Find workloads that still expose the AF_ALG path

Key findings

What Copy Fail is

Why Kubernetes changes the impact

What we tested

1. Does RuntimeDefault allow AF_ALG?

2. Does PSS Restricted block it?

3. Can another pod observe page-cache changes?

4. Does the effect outlive the attacking pod?

5. Does this require hostPath?

Can this become root in Kubernetes?

Disk stayed clean while cached reads changed

The mitigation we verified

What Juliet detects today

What Kubernetes teams should do now

1. Patch nodes first

2. Do not assume RuntimeDefault blocks AF_ALG

3. Use a targeted Localhost seccomp profile for untrusted workloads

4. Reduce node-sharing risk

5. Treat suspected nodes as suspect

What We Did and Did Not Prove

FAQ

Is this a Kubernetes vulnerability?

Does Pod Security Standards Restricted stop it?

Does RuntimeDefault seccomp stop it?

Can this become root?

Does this require hostPath?

What should I block?

Sources

Get notified when we publish

Find workloads that still expose the AF_ALG path