Question

我正在尝试创建另一个版本的clone（2）syscall（在内核空间中）来创建一个带有一些附加参数的用户进程的克隆。这个系统调用将完成与clone（2）完全相同的工作但我想从user_space向内核传递一个额外的参数。但是当我看到glibc的code时似乎每个参数的传递顺序与用户调用clone（）

的顺序不同

int clone(int (*fn)(void *), void *child_stack,
             int flags, void *arg, ...
             /* pid_t *ptid, void *newtls, pid_t *ctid */ );

而其中一些是由glibc的代码本身处理的。我在互联网上搜索了glib的clone（）是如何工作但却找不到更好的文档。任何人都可以解释

glibc如何处理clone（）？
并且内核中syscall的所有参数与glibc中的clone都不完全相同，那么这些变量是如何处理的呢？

Answer 1

glibc如何处理clone（）？

通过特定于arch的程序集包装器。对于i386，请参阅glibc源中的sysdeps/unix/sysv/linux/i386/clone.S;对于x86-64，请参阅sysdeps/unix/sysv/linux/x86-64/clone.S，依此类推。

正常的系统调用包装器是不够的，因为切换堆栈需要用户空间。除了系统调用之外，上面的汇编文件还提供了关于用户空间实际需要完成的内容的非常有用的评论。

内核中syscall的所有参数与glibc中的clone都不完全相同，那么这些变量是如何处理的呢？

映射到系统调用的C库函数是包装函数。

例如，考虑POSIX.1 write() C库低级I / O函数和Linux write()系统调用。参数基本相同，错误条件也是如此，但错误返回值不同。如果发生错误，C库函数返回-1并设置errno，而Linux系统调用返回负错误代码（基本上匹配errno值）。

如果你看一下，例如sysdeps/unix/sysv/linux/x86_64/sysdep.h，您可以看到x86-64上Linux的基本系统调用包装器归结为

# define INLINE_SYSCALL(name, nr, args...) \
  ({                                       \
    unsigned long int resultvar = INTERNAL_SYSCALL (name, , nr, args);        \
    if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (resultvar, )))            \
      {                                                                       \
        __set_errno (INTERNAL_SYSCALL_ERRNO (resultvar, ));                   \
        resultvar = (unsigned long int) -1;                                   \
      }                                                                       \
    (long int) resultvar; })

只调用实际的系统调用，然后检查syscall返回值是否指示错误;如果是，则将结果更改为-1并相应地设置errno。它看起来很滑稽，因为它依赖于GCC扩展，使其表现为单一声明。

我们假设你在Linux上添加了一个新的系统调用，比如说

SYSCALL_DEFINE2(splork, unsigned long, arg1, void *, arg2);

并且，无论出于何种原因，您希望将其作为

公开给用户空间

int splork(void *arg2, unsigned long arg1);

没问题！您所需要的只是提供最小的头文件，

#ifndef _SPLORK_H
#define _SPLORK_H
#define _GNU_SOURCE
#include <sys/syscall.h>
#include <errno.h>

#ifndef __NR_splork
#if defined(__x86_64__)
#define __NR_splork /* syscall number on x86-64 */
#else
#if defined(__i386)
#define __NR_splork /* syscall number on i386 */
#endif
#endif

#ifdef __NR_splork
#ifndef SYS_splork
#define SYS_splork __NR_splork
#endif

int splork(void *arg2, unsigned long arg1)
{
    long retval;

    retval = syscall(__NR_splork, (long)arg1, (void *)arg2);
    if (retval < 0) {
        /* Note: For backward compatibility, we might wish to use
                     *(__errno_location()) = -retval;
                 here. */
        errno = -retval;
        return -1;
    } else
        return (int)retval;
}

#else
#undef SYS_splork

int splork(void *arg2, unsigned long arg1)
{
    /* Note: For backward compatibility, we might wish to use
                 *(__errno_location()) = ENOTSUP;
             here. */
    errno = ENOTSUP;
    return -1;
}

#endif

#endif /* _SPLORK_H */

SYS_splork和__NR_splork是预处理器宏，用于定义新系统调用的系统调用号。由于系统调用号可能（还没有？）包含在官方内核源代码和头文件中，因此上述头文件为每个支持的体系结构显式声明了它。对于不受支持的体系结构，splork()函数将始终使用-1返回errno == ENOTSUP。

但请注意，Linux系统调用仅限于6个参数。如果你的内核函数需要更多，你需要将参数打包到一个结构中，将该结构的地址传递给内核，并使用copy_from_user()将值复制到内核中的相同结构。

在所有Linux体系结构中，指针和long具有相同的大小（int可能小于指针），因此我建议您使用long或固定大小的类型这种结构将数据传入/传出内核。

如何在linux内核中实现clone（2）系统调用的另一种变体？

1 个答案: