# We Tested Copy Fail in Kubernetes: PSS Restricted and RuntimeDefault Did Not Block AF_ALG

> Canonical: https://juliet.sh/blog/we-tested-copy-fail-in-kubernetes-pss-restricted-runtime-default-af-alg
> Published: 2026-04-30
> Author: Juliet Security Team
> Tags: kubernetes-security, linux-kernel, seccomp, runtime-security, cve-2026-31431, pod-security-standards
> Read time: 13 min read

_Copy Fail is a Linux kernel page-cache corruption bug. We reproduced the primitive on Talos/containerd and EKS/Amazon Linux 2023/containerd: a non-root PSS Restricted pod reached AF_ALG, modified cached bytes for a shared image-layer file, and another pod on the same node observed the change. In controlled labs on both clusters, a separate allowPrivilegeEscalation pod consumed a mutated purpose-built setuid helper and reached euid 0. Here is what we tested, what we did not test, and how to defend Kubernetes nodes without overclaiming._

---

On April 22, 2026, the Linux CNA published [CVE-2026-31431](https://cveawg.mitre.org/api/cve/CVE-2026-31431), a Linux kernel vulnerability in `algif_aead`, the AEAD side of the kernel's AF_ALG crypto socket interface. Xint named the bug [Copy Fail](https://xint.io/blog/copy-fail-linux-distributions) and showed how page-cache bytes for a read-only file can be changed without dirtying the file on disk.

That Linux framing is accurate, but it leaves a Kubernetes question unanswered:

If a pod is non-root, has all Linux capabilities dropped, uses `RuntimeDefault` seccomp, and is admitted under Pod Security Standards Restricted, can it still reach the Copy Fail kernel path?

We tested that on two real Kubernetes clusters:

- Talos `v1.12.2`, kernel `6.18.5-talos`, containerd `2.1.6`

- EKS on Amazon Linux 2023.11, kernel `6.12.79-101.147.amzn2023.x86_64`, containerd `2.2.1`

The short answer: yes. In both clusters, a non-root pod admitted under PSS Restricted could create an AF_ALG socket and bind the relevant AEAD algorithm path. In both clusters, a non-root pod could change cached bytes for a file baked into a purpose-built container image layer, and another pod from the same image on the same node observed the changed bytes. `RuntimeDefault` did not block the path. A custom `Localhost` seccomp profile denying `socket(AF_ALG, ...)` did.

We also ran controlled root-chain labs on both Talos/containerd and EKS/containerd with a purpose-built setuid helper inside a test image. Those labs reached container euid 0 when the consuming pod allowed privilege escalation. A PSS Restricted writer pod could mutate the shared image-layer bytes, but `allowPrivilegeEscalation: false` prevented that pod from using the setuid handoff itself.

This post is deliberately scoped. We are not publishing exploit code. We did not target host setuid binaries, host executables, package-managed files, or production application files. The setuid test used only a purpose-built helper inside a disposable lab image.

## Key findings

- PSS Restricted did not block the relevant AF_ALG path in either tested cluster.

- `RuntimeDefault` seccomp did not block AF_ALG on Talos/containerd or EKS/containerd.

- A custom `Localhost` seccomp profile denying `socket(AF_ALG, ...)` blocked the path in both clusters.

- Cross-pod page-cache visibility was reproducible on the same node, including with a shared container image layer.

- Controlled Talos and EKS labs reached container euid 0 when a pod with `allowPrivilegeEscalation: true` executed a mutated purpose-built setuid helper from a shared image layer.

- PSS Restricted did not stop the page-cache mutation, but `allowPrivilegeEscalation: false` did stop that setuid-helper path from becoming root inside the restricted pod.

- On-disk bytes stayed clean in the hostPath lab while normal cached reads observed changed bytes.

## What Copy Fail is

The official CVE title is `crypto: algif_aead - Revert to operating out-of-place`. The CVE record assigns CVSS 3.1 score 7.8 High with vector `AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H`. The upstream affected range starts at Linux commit `72548b093ee38a6d4f2a19e6ef1948ae05c181f7`; the CVE record lists fixes at `fafe0fa2995a0f7073c1c358d7d3145bcc9aedd8`, `ce42ee423e58dffa5ec03524054c9d8bfd4f6237`, and `a664bf3d603dc3bdcf9ae47cc21e0daec706d7a5`.

Xint's writeup explains the primitive: an unprivileged local process can use AF_ALG AEAD operations and `splice()` in a way that causes kernel writes into file-backed page-cache pages. Those pages are used by normal reads and execution paths, but the file is not dirtied on disk in the usual way.

That distinction matters in Kubernetes because containers on the same node share the host kernel and the host page cache. Kubernetes namespaces isolate many things. They do not give each pod a private Linux kernel.

## Why Kubernetes changes the impact

Kubernetes teams usually reason about this kind of bug through workload posture:

- Is the pod root or non-root?

- Is it privileged?

- Does it have dangerous Linux capabilities?

- Does it use host namespaces or hostPath?

- Is the namespace enforcing Pod Security Standards Restricted?

- Is seccomp enabled?

Those questions are still useful, but Copy Fail cuts across one of the common assumptions: `RuntimeDefault` seccomp is not the same thing as denying AF_ALG.

Kubernetes defines `RuntimeDefault` as the container runtime's default seccomp profile. That profile varies by runtime and release. In the profiles we checked, Docker/Moby and containerd deny `socket(AF_VSOCK, ...)`, but they do not deny `socket(AF_ALG, ...)`. AF_ALG is address family `38`; AF_VSOCK is `40`.

That means a pod can look reasonable by most Kubernetes posture standards and still reach the kernel interface needed for this vulnerability.

## What we tested

We ran five defensive lab tests on both clusters, then ran controlled root-chain labs on both Talos/containerd and EKS/containerd.

### 1. Does RuntimeDefault allow AF_ALG?

On both clusters, a non-root pod with all capabilities dropped and `seccompProfile.type: RuntimeDefault` successfully created an AF_ALG socket.

Talos result:

`AF_ALG_SOCKET_OK
AF_ALG_AUTHENCESN_BIND_OK
`
EKS result:

`AF_ALG_SOCKET_OK
AF_ALG_AUTHENCESN_BIND_OK
`
That does not prove exploitation by itself. It proves the relevant syscall path was reachable from a normal pod.

### 2. Does PSS Restricted block it?

No. In both clusters, a namespace with `pod-security.kubernetes.io/enforce=restricted` admitted a non-root `RuntimeDefault` pod that could create AF_ALG sockets and bind the relevant AEAD algorithm.

Both clusters returned:

`PSS_RESTRICTED_AFALG_OK
`
This is not a Kubernetes bug. PSS Restricted is a workload hardening baseline. It does not promise to deny every kernel attack surface.

The important takeaway is simpler: do not tell yourself "PSS Restricted" means "AF_ALG is blocked."

### 3. Can another pod observe page-cache changes?

Yes, for a shared hostPath inode in our lab.

On Talos, a writer pod in one namespace changed a throwaway file's cached bytes at offset 64 from `CROS` to `TLXN`. A reader pod in a different namespace observed the changed bytes:

`CROSSNS_WRITER_BEFORE 43524f53 b'CROS'
CROSSNS_WRITER_AFTER  544c584e b'TLXN'
CROSSNS_READER_OFFSET_64 544c584e b'TLXN'
`
On EKS, the same cross-namespace test changed `CROS` to `EKXN`:

`CROSSNS_WRITER_BEFORE 43524f53 b'CROS'
CROSSNS_WRITER_AFTER  454b584e b'EKXN'
CROSSNS_READER_OFFSET_64 454b584e b'EKXN'
`
The namespaces did not matter because the cache is per node, not per namespace.

### 4. Does the effect outlive the attacking pod?

Yes, in our lab. After deleting the writer and reader Jobs, a later reader Job scheduled on the same node still observed the changed cached bytes.

Talos:

`LIFECYCLE_READER_OFFSET_64 544c584e b'TLXN'
`
EKS:

`LIFECYCLE_READER_OFFSET_64 454b584e b'EKXN'
`
This is not persistence on disk. It is persistence in the node's page cache. Rebooting or otherwise evicting the relevant pages clears that class of effect, but pod deletion alone did not.

### 5. Does this require hostPath?

This was the test we cared about most. HostPath is already a high-risk pattern, and a Kubernetes audience would be right to ask whether this is just "hostPath is dangerous" with extra steps.

We built a purpose-specific lab image with a read-only file at `/copyfail-lab/target.bin`. Pod A ran from that image and changed cached bytes for the file. Pod B ran from the same image on the same node and read the same path.

On Talos, Pod A changed `IMG0` to `TIMG`, and Pod B saw `TIMG`:

`IMAGE_WRITER_BEFORE 494d4730 b'IMG0'
IMAGE_WRITER_AFTER  54494d47 b'TIMG'
IMAGE_READER_OFFSET_64 54494d47 b'TIMG'
IMAGE_EXPECTED_SHA256 915a6e4d52cf856e62c67cdb4e453c785c3ec6515e51dee3fd3f95e1c9d9f03a
IMAGE_READER_SHA256   a18219a47a679b294ed62a16bd6a9c8b050799c5a282f3e650b63745970fa8ac
`
On EKS, Pod A changed `IMG0` to `EIMG`, and Pod B saw `EIMG`:

`IMAGE_WRITER_BEFORE 494d4730 b'IMG0'
IMAGE_WRITER_AFTER  45494d47 b'EIMG'
IMAGE_READER_OFFSET_64 45494d47 b'EIMG'
IMAGE_EXPECTED_SHA256 915a6e4d52cf856e62c67cdb4e453c785c3ec6515e51dee3fd3f95e1c9d9f03a
IMAGE_READER_SHA256   f4b6127700e72fe2c7f9293a427e7966bd66572df3a16a505364f7bd1a1448a5
`
That is the Kubernetes-relevant finding: in our Talos/containerd and EKS/containerd labs, this was not limited to hostPath. A file baked into a container image layer was enough for cross-pod visibility on the same node.

Do not overgeneralize this beyond what we tested. Snapshotter behavior, storage drivers, image garbage collection, page-cache eviction, and runtime configuration can change the details. But the claim "this only matters if you mount hostPath" is not true in the clusters we tested.

## Can this become root in Kubernetes?

Yes, with important boundaries.

After the image-layer test, we built second disposable lab images for Talos and EKS. They contained a purpose-built setuid-root helper whose only privileged behavior was writing a marker inside its own container when a four-byte marker in its own image-layer file had changed. The helpers did not target host files, package-managed files, or real application binaries.

In a non-root pod with all Linux capabilities dropped, `RuntimeDefault` seccomp, and `allowPrivilegeEscalation: true`, the helper denied access before mutation and then reached euid 0 after the cached image-layer bytes changed:

`COPYFAIL_LAB_DENY found=0 ruid=1000 euid_start=0 euid_now=1000
COPYFAIL_WRITE ... before=4a4c5430 after=4a4c5431
COPYFAIL_LAB_ALLOW found=1 regain=0 marker_fd=3 ruid=1000 euid_start=0 euid_now=0
`
We reproduced the same sequence on both Talos/containerd and EKS/containerd. That proves a full container-root chain in our labs when a reachable setuid target exists and the consuming pod allows privilege escalation.

PSS Restricted changes that part of the chain. We ran the same mutated helper under `allowPrivilegeEscalation: false`, which sets `no_new_privs`. The mutated bytes were visible, but the helper started without setuid elevation and could not regain euid 0:

`COPYFAIL_LAB_ALLOW found=1 regain=-1 marker_fd=-1 errno=1 ruid=1000 euid_start=1000 euid_now=1000
`
The Kubernetes-specific result is the cross-pod version, which also reproduced on both clusters. A PSS Restricted writer pod changed cached bytes for the helper inside a shared image layer. A separate pod from the same image on the same node, with `allowPrivilegeEscalation: true`, then observed the changed image-layer bytes and reached euid 0:

`WRITER: COPYFAIL_WRITE ... before=4a4c4330 after=4a4c4331
WRITER: COPYFAIL_LAB_ALLOW found=1 regain=-1 marker_fd=-1 euid_start=1000 euid_now=1000
READER_JLC0_OFFSET -1 READER_JLC1_OFFSET 8192
READER: COPYFAIL_LAB_ALLOW found=1 regain=0 marker_fd=3 ruid=1000 euid_start=0 euid_now=0
`
What this means: PSS Restricted did not prevent a pod from changing shared image-layer page-cache bytes, and a different pod with a reachable setuid target and privilege escalation enabled could consume that changed cache state.

What this does not mean: we did not prove host-root compromise, container escape, or that every PSS Restricted workload can become root through this exact setuid path. The root-chain result depends on a reachable target that can turn modified bytes into privileged execution.

## Disk stayed clean while cached reads changed

For the hostPath lab file, we compared normal cached reads with `O_DIRECT` reads.

On Talos, normal reads saw `JLT!`, but direct reads saw the original `0123` and original SHA-256:

`NORMAL_OFFSET_64 4a4c5421 b'JLT!'
NORMAL_SHA256 40879caad4634328849e0405d26e23f506ea54c86c3a8176a036c3acb4a9e39a
DIRECT_OFFSET_64 30313233 b'0123'
DIRECT_SHA256 d281ea21d2bc15d0f737288a081f5982ad6e08d836af73ca013ab00e469cf27f
`
On EKS, normal reads saw `EKS!`, but direct reads saw the original `0123` and original SHA-256:

`NORMAL_OFFSET_64 454b5321 b'EKS!'
NORMAL_SHA256 0e33005673886ecf2d6f7782dbcfca3f7ef29d09411ca4a925a6e8b83b15bb4d
DIRECT_OFFSET_64 30313233 b'0123'
DIRECT_SHA256 d281ea21d2bc15d0f737288a081f5982ad6e08d836af73ca013ab00e469cf27f
`
That supports Xint's core observation: this class of corruption can affect what normal file reads see without changing persistent file content. It also explains why on-disk-only integrity checks can be the wrong control for this bug class.

Careful wording matters here. We are not saying every file-integrity product fails in every configuration. We are saying that if a tool only verifies persistent on-disk bytes, it can miss a page-cache-only change that normal reads still observe.

## The mitigation we verified

Kernel patching is the primary fix. Red Hat, Ubuntu, Debian, and other distributions should be tracked through their own advisory and package channels because vendor kernels often backport fixes without matching upstream version numbers.

As a compensating control, we tested a `Localhost` seccomp profile that denies `socket()` when the first argument is AF_ALG (`38`). On both Talos and EKS, that blocked the path with `EPERM` and left a fresh target unchanged.

Talos:

`AF_ALG_SOCKET_BLOCKED errno=1 strerror=Operation not permitted
MITIGATED_BEFORE 61626364 b'abcd'
MITIGATED_AFTER  61626364 b'abcd'
`
EKS:

`AF_ALG_SOCKET_BLOCKED errno=1 strerror=Operation not permitted
MITIGATED_BEFORE 61626364 b'abcd'
MITIGATED_AFTER  61626364 b'abcd'
`
We also reran the Talos setuid-helper lab under the same AF_ALG-denying `Localhost` profile. The writer failed at `socket(AF_ALG, ...)`, the fresh helper marker stayed unchanged, and the helper did not reach the mutated path:

`PermissionError: [Errno 1] Operation not permitted
AFTER_JLB0_OFFSET 8192 AFTER_JLB1_OFFSET -1
COPYFAIL_LAB_DENY found=0 ruid=1000 euid_start=0 euid_now=1000
`
Kubernetes does not inline seccomp syscall filters in pod YAML. A `Localhost` profile has to exist on the node under the kubelet seccomp profile path, and pod specs refer to it by name. That means a remediation plan has two parts:

- Put the AF_ALG-denying profile on every node that needs it.

- Enforce that untrusted workloads use that profile until the kernel is patched.

Do not treat the word `Localhost` as magic. You need to verify the actual profile content.

## What Juliet detects today

Juliet now has an initial Copy Fail scanner, plus a policy gate for teams that want to enforce the temporary seccomp mitigation while node kernels are being patched.

The scanner joins three signals:

- Node KBOM facts: `osImage`, `kernelVersion`, and `containerRuntimeVersion`.

- Workload posture: each container's effective seccomp profile after pod-level inheritance.

- Node-local seccomp profile content: Juliet's node agent hashes and parses kubelet `Localhost` profile JSON and verifies whether the referenced profile denies `socket(AF_ALG, ...)`.

Juliet opens a high-severity Issue when a pod is scheduled on a Copy Fail `affected` or `unknown` node and at least one container does not use a node-verified `Localhost` profile that denies AF_ALG. `RuntimeDefault`, unset, `Unconfined`, non-`Localhost`, missing profile content, parse failures, and `Localhost` profiles that do not prove the deny all remain exposed.

**Want Juliet to check your clusters for this?** [Start Juliet free](https://app.juliet.sh/register?plan=starter), connect a cluster, then open **Security → All Findings** and search for `Copy Fail` or `CVE-2026-31431`. Existing users can also ask Explorer: `which pods are exposed to Copy Fail?`

We also added a high-severity built-in policy named `copyfail-require-custom-seccomp`. It is disabled by default and intended for customers who want to audit or enforce this specific mitigation. The policy flags containers, init containers, and ephemeral containers whose effective seccomp profile is unset, `Unconfined`, `RuntimeDefault`, or any non-`Localhost` value.

One important constraint: Juliet does not claim full node CVE fixed status from kernel strings alone. Kernel version matching is risky because vendors backport fixes. The scanner marks the kernels we reproduced in lab as `affected`, honors vendor-not-affected cases we have validated, and otherwise keeps the node in `unknown` until vendor package/advisory proof exists.

## What Kubernetes teams should do now

### 1. Patch nodes first

Treat the node kernel as the source of truth. Use your OS vendor's advisory and package channel, not only upstream semantic kernel versions.

Red Hat rates the issue Important and lists RHEL 8, RHEL 9, RHEL 10, and RHEL 8/9 `kernel-rt` streams as affected in its tracker. Ubuntu marks the issue High. Debian's tracker shows package-specific status across suites. Those statuses can change as packages ship, so automate against vendor data where possible.

### 2. Do not assume RuntimeDefault blocks AF_ALG

We tested containerd defaults on Talos and EKS and found AF_ALG reachable. The current Moby and containerd default profiles we reviewed do not deny AF_ALG either.

If your temporary mitigation depends on seccomp, test the actual runtime behavior on each node family.

### 3. Use a targeted Localhost seccomp profile for untrusted workloads

Blocking all `socket()` calls will break workloads. The narrow control is to deny `socket()` only when the address family is AF_ALG.

Apply that profile to namespaces that run untrusted code, CI jobs, build runners, multi-tenant workloads, or anything that executes customer-supplied plugins.

### 4. Reduce node-sharing risk

The image-layer result matters because many pods on a node can share the same lower-layer file. Until patched, reduce unnecessary co-location between high-risk and high-value workloads:

- Isolate CI/build workloads onto dedicated nodes.

- Avoid mixing multi-tenant workloads with control-plane-adjacent or privileged workloads.

- Reduce privileged pods, host namespaces, and broad hostPath mounts.

- Consider sandboxed runtimes, but test them. Neither of our clusters had gVisor, Kata, or Firecracker RuntimeClasses available, so we are not making a sandbox-runtime claim here.

### 5. Treat suspected nodes as suspect

Page-cache-only corruption is transient, but a process that exploited it may have used the resulting access to make persistent changes elsewhere. If you suspect exploitation, patch or replace the node and investigate the workloads that ran there.

## What We Did and Did Not Prove

There is enough here without exaggeration.

We did prove a controlled container-root chain on Talos/containerd and EKS/containerd when a pod with `allowPrivilegeEscalation: true` consumed a mutated purpose-built setuid helper from a shared image layer.

Do not claim every Kubernetes cluster is vulnerable. The result depends on the running kernel, vendor patches, runtime configuration, and workload policy.

Do not claim container escape or host-root compromise is guaranteed. This is a powerful local kernel primitive, but node compromise depends on accessible target files, namespaces, mounts, workload behavior, and what the attacker can cause another process to read or execute.

Do not claim PSS Restricted alone makes this harmless. In our lab, PSS Restricted stopped the setuid handoff inside the restricted pod, but it did not stop that pod from mutating shared image-layer cache bytes that a different pod later consumed.

Do not claim file-integrity tools always fail. Say the precise thing: on-disk-only checks can miss page-cache-only changes.

Do not claim sandboxed runtimes mitigate this until tested. They may change the risk model, but we did not have a sandbox RuntimeClass in either tested cluster.

## FAQ

### Is this a Kubernetes vulnerability?

No. CVE-2026-31431 is a Linux kernel vulnerability. Kubernetes matters because pods on a node share the host kernel and, in the scenarios we tested, page-cache effects were visible across pods on the same node.

### Does Pod Security Standards Restricted stop it?

No, not in our tests. PSS Restricted admitted a non-root `RuntimeDefault` pod that could create AF_ALG sockets and bind the relevant AEAD algorithm path.

### Does RuntimeDefault seccomp stop it?

No, not in the Talos/containerd and EKS/containerd clusters we tested. `RuntimeDefault` is runtime-defined. Common runtime defaults we checked did not deny AF_ALG.

### Can this become root?

In our Talos and EKS labs, yes for container-root under specific conditions: a reachable setuid target in a purpose-built image layer and a consuming pod with `allowPrivilegeEscalation: true`. We did not prove host-root compromise or container escape, and `allowPrivilegeEscalation: false` prevented that setuid handoff in the restricted pods we tested.

### Does this require hostPath?

No, not in our tests. We reproduced cross-pod visibility using a purpose-built container image layer. HostPath was useful for proving the disk-vs-page-cache behavior with `O_DIRECT`, but it was not required for the image-layer result.

### What should I block?

For a seccomp compensating control, deny `socket()` when arg0 is AF_ALG (`38`). Validate the actual node-local profile and test runtime behavior. Patch the kernel as the primary fix.

## Sources

- [CVE-2026-31431 record](https://cveawg.mitre.org/api/cve/CVE-2026-31431)

- [NVD entry for CVE-2026-31431](https://nvd.nist.gov/vuln/detail/CVE-2026-31431)

- [Xint: Copy Fail](https://xint.io/blog/copy-fail-linux-distributions)

- [copy.fail](https://copy.fail/)

- [Linux stable fix commit fafe0fa2](https://git.kernel.org/stable/c/fafe0fa2995a0f7073c1c358d7d3145bcc9aedd8)

- [Linux stable fix commit ce42ee42](https://git.kernel.org/stable/c/ce42ee423e58dffa5ec03524054c9d8bfd4f6237)

- [Linux stable fix commit a664bf3d](https://git.kernel.org/stable/c/a664bf3d603dc3bdcf9ae47cc21e0daec706d7a5)

- [Red Hat CVE data](https://access.redhat.com/security/cve/cve-2026-31431)

- [Ubuntu CVE data](https://ubuntu.com/security/CVE-2026-31431)

- [Debian security tracker](https://security-tracker.debian.org/tracker/CVE-2026-31431)

- [Kubernetes Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/)

- [Kubernetes seccomp documentation](https://kubernetes.io/docs/tutorials/security/seccomp/)

- [Kubernetes Linux kernel security constraints](https://kubernetes.io/docs/concepts/security/linux-kernel-security-constraints/)

- [Docker seccomp documentation](https://docs.docker.com/engine/security/seccomp/)

- [Moby default seccomp profile](https://raw.githubusercontent.com/moby/profiles/main/seccomp/default.json)

- [containerd default seccomp profile](https://raw.githubusercontent.com/containerd/containerd/main/contrib/seccomp/seccomp_default.go)

- [Linux socket address family constants](https://raw.githubusercontent.com/torvalds/linux/master/include/linux/socket.h)