Question

为了模拟某些行为，我想将探测器附加到系统调用，并在传递某些参数时修改返回值。或者，在函数处理之前修改函数的参数也就足够了。

这可以用BPF吗？

Answer 1

在内核探测器（kprobes）中，eBPF虚拟机具有对syscall参数的只读访问权并返回值。

然而，eBPF程序将拥有自己的返回代码。可以应用一个seccomp配置文件来捕获BPF（NOT eBPF;感谢@qeole）返回码并在执行期间中断系统调用。

允许的运行时修改是：

SECCOMP_RET_KILL：使用SIGSYS立即杀死
SECCOMP_RET_TRAP：发送一个可捕获的SIGSYS，有机会模仿系统调用
SECCOMP_RET_ERRNO：强制errno值
SECCOMP_RET_TRACE：决定ptracer或将errno设置为-ENOSYS
SECCOMP_RET_ALLOW：允许

https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt

SECCOMP_RET_TRACE方法允许修改执行的系统调用，参数或返回值。这取决于体系结构，强制外部引用的修改可能会导致ENOSYS错误。

它通过将执行传递给等待用户空间ptrace来实现，ptrace能够修改跟踪的进程内存，寄存器和文件描述符。

跟踪器需要调用ptrace然后调用waitpid。一个例子：

ptrace(PTRACE_SETOPTIONS, tracee_pid, 0, PTRACE_O_TRACESECCOMP);
waitpid(tracee_pid, &status, 0);

http://man7.org/linux/man-pages/man2/ptrace.2.html

当waitpid返回时，根据status的内容，可以使用PTRACE_GETEVENTMSG ptrace操作检索seccomp返回值。这将检索seccomp SECCOMP_RET_DATA值，该值是BPF程序设置的16位字段。例如：

ptrace(PTRACE_GETEVENTMSG, tracee_pid, 0, &data);

在继续操作之前，可以在内存中修改Syscall参数。您可以使用PTRACE_SYSCALL步骤执行单个系统调用条目或退出。在恢复执行之前，可以在用户空间中修改Syscall返回值;底层程序将无法看到系统调用返回值已被修改。

示例实现： Filter and Modify System Calls with seccomp and ptrace

Answer 2

我相信将eBPF附加到kprobes / kretprobes可以让您对函数参数和返回值进行读取访问，但是您无法篡改它们。我不是百分百肯定;要求确认的好地方是IO Visor项目mailing list或IRC频道（irc.oftc.net上的#iovisor）。

作为替代解决方案，我知道您至少可以使用-e选项更改具有strace的系统调用的返回值。引用the manual page：

-e inject=set[:error=errno|:retval=value][:signal=sig][:when=expr]
       Perform syscall tampering for the specified set of syscalls.

此外，如果您对此感兴趣，2017年Fosdem上有a presentation和故障注入。以下是幻灯片中的一个示例命令：

strace -P precious.txt -efault=unlink:retval=0 unlink precious.txt

编辑如Ben所述，关于kprobes和tracepoints的eBPF绝对是只读的，用于跟踪和监控用例。我也在IRC上得到了确认。

Answer 3

可以使用 eBPF 将错误注入到系统调用调用中：https://lwn.net/Articles/740146/

有一个名为 bpf_override_return() 的 bpf 函数，它可以覆盖调用的返回值。这是一个使用 bcc 作为前端的示例：https://github.com/iovisor/bcc/blob/master/tools/inject.py

根据Linux manual page：

<块引用>

bpf_override_return() 仅在内核使用 CONFIG_BPF_KPROBE_OVERRIDE 配置选项编译时才可用，在这种情况下，它仅适用于内核代码中标有 ALLOW_ERROR_INJECTION 的函数。

此外，该助手仅适用于具有 CONFIG_FUNCTION_ERROR_INJECTION 选项的架构。在撰写本文时，x86 架构是唯一支持此功能的架构。

可以在错误注入框架中添加一个函数。可在此处找到更多信息：https://github.com/iovisor/bcc/issues/2485

Answer 4

可以使用eBPF修改某些用户空间内存。如bpf.h header file中所述：

 * int bpf_probe_write_user(void *dst, const void *src, u32 len)
 *  Description
 *      Attempt in a safe way to write *len* bytes from the buffer
 *      *src* to *dst* in memory. It only works for threads that are in
 *      user context, and *dst* must be a valid user space address.
 *
 *      This helper should not be used to implement any kind of
 *      security mechanism because of TOC-TOU attacks, but rather to
 *      debug, divert, and manipulate execution of semi-cooperative
 *      processes.
 *
 *      Keep in mind that this feature is meant for experiments, and it
 *      has a risk of crashing the system and running programs.
 *      Therefore, when an eBPF program using this helper is attached,
 *      a warning including PID and process name is printed to kernel
 *      logs.
 *  Return
 *      0 on success, or a negative error in case of failure.

也引用the BPF design Q&A：

跟踪BPF程序可能会覆盖当前的用户内存 bpf_probe_write_user（）执行任务。每次加载此类程序内核将打印警告消息，因此此帮助程序仅有用用于实验和原型。跟踪BPF程序仅是root。

您的eBPF可能会将数据写入用户空间内存位置。请注意，您仍然无法从您的eBPF程序中修改内核结构。

eBPF可以修改系统调用的返回值或参数吗？

4 个答案: