List: openbsd-tech
Subject: pinning all system calls
From: "Theo de Raadt" <deraadt () openbsd ! org>
Date: 2023-12-08 16:36:04
Message-ID: 26208.1702053364 () cvs ! openbsd ! org
[Download RAW message or body]
First, the backstory.
A few years ago I made changes so that system calls
could only be perform from 4 places in the address space:
1) in the text of a static binary
2) in the signal trampoline
3) in ld.so text, and in that case the main program's text cannot do
system calls
4) in libc.so text, and ld.so tells the kernel where that is
The first 3 were cases were configured entirely by the kernel, the 4th case
used a new msyscall(2) system call to identify the correct libc.so region.
More recently, I made another change, so that the execve(2) system call could
only be called from a singular, precise point in a static binary or in libc.so.
These changes attempt to disrupt methodologies commonly used in attacks. I
make no claim these changes stop all methods. Combined with other behaviours
we have (like libc random relinking), they will require an attacker to use
other methods, which are hopefully more fragile. Increasing the unknown and
requiring specific entry points increases the fragility and difficulty.
Another benefit is that it requires unique methodology for OpenBSD, which
requirements investment.
I have about 5 steps coming in the near future. Here is one of those steps.
A few years ago immediately after msyscall(2), nayden@ asked me if it
was possible for the kernel to know validate the locations of system
call, and I proceeded to tell him a bunch of reasons this was
impossible, mostly relating to information not known, and cost
complexity. But it hung around in my lower brain and I eventually had
to do it.
A system call stub generally looks something like this:
xx: b8 05 00 00 00 mov $0x5,%eax
xx: 0f 05 syscall
This means "perform operation #5, which is open(2)"
Inside the kernel, we know the system call # and the address of the syscall
instruction.
I add a non-LOAD ELF extension (program header and section header) called
"openbsd.syscalls". This is found in ld.so(1) and the libc.so library
and in the system call stubs as .o files in libc.a for static binaries,
and also in static binaries that are linked against this new libc.a.
(There is no new risk from having this (unmapped non-LOAD) information
in the libc.so file, because an attacker with access to the file can
already use a debugger to find the specific offsets. This format is
just easier for the kernel and ld.so to handle)
It is an array of { offset, system call # }. For static binaries and ld.so(1),
the kernel parses this array and creates a new array attached to the process
which is indexed by the system call number, which has values: 0 (system call
not allowed), 1 (allowed, and we don't care about the address), or a specific
offset inside the ELF binary where the system call instruction is for that
specific system call number.
Like with msyscall(2) before, ld.so(1) does the same job of parsing the
"openbsd.syscalls" in libc.so, and uses a new pinsyscall(2) system call to
tell the kernel where the system calls are allowed to enter form.
Like msyscal(2) before, this results in 4 places that system calls can
come from:
1) in the text of a static binary, because the kernel loaded a table for
*ONLY* the system calls linked into the binary. It's important to
realize what this means, by example. The ping(1) binary does not call
execve(2) or fork(2). So now you can't ever call fork() or execve()
because there is no "syscall" instruction for those two system calls.
It also cannot call accept(2).
2) in the signal trampoline, we only accept sigreturn(2). sigreturn(2)
never occurs anwhere else. This is a 2nd layer of SROP mitigation.
3) The syscall instructions inside ld.so(1) text can only call the
system calls it has stubs for, and each stub can only call the specific
system call it is intended to call.
4) in libc.so's table, all the system call stub "syscall instructions"
are registered.
There is an outstanding issue with syscall(2), which is SYS_syscall, or
syscall #0, the indirect system call. That dangerous API is no longer
required and will be deleted soon.
A few pieces of this have been pre-commited to make development easier
(in particular, lld and ld.bfd support by kettenis@ so the toolchain
will propogate the ELF tables correctly, and a stub pinsyscall(2) system
call to avoid chicken-and-egg failures.)
5 architectures now work: amd64 i386 sparc64 powerpc64 mips64, some of
the .S files may still have subtle bugs for other architectures.
https://marc.info/?l=openbsd-tech&m=170205367232026&w=2
--
FROM 120.41.146.*