如何使用没有Glibc的C中的内联汇编来获取参数值?

时间:2018-05-09 19:37:00

标签: c assembly x86-64 i386

如何使用没有Glibc的C中的内联汇编来获取参数值?

我需要Linux archecture x86_64i386的此代码。 如果您了解MAC OS XWindows,请同时提交并请指导。

void exit(int code)
{
    //This function not important!
    //...
}
void _start()
{
    //How Get arguments value using inline assembly
    //in C without Glibc?
    //argc
    //argv
    exit(0);
}

新更新

https://gist.github.com/apsun/deccca33244471c1849d29cc6bb5c78e

#define ReadRdi(To) asm("movq %%rdi,%0" : "=r"(To));
#define ReadRsi(To) asm("movq %%rsi,%0" : "=r"(To));
long argcL;
long argvL;
ReadRdi(argcL);
ReadRsi(argvL);
int argc = (int) argcL;
//char **argv = (char **) argvL;
exit(argc);

但它仍然返回0。 所以这段代码错了! 请帮忙。

2 个答案:

答案 0 :(得分:8)

如注释中所指定的,堆栈上提供了argcargv,因此即使使用内联汇编,也无法使用常规C函数来获取它们,因为编译器将触及堆栈用于分配局部变量的指针,设置堆栈帧和放大器。合作;因此,_start必须用汇编语言编写,因为它是在glibc(x86; x86_64)中完成的。根据常规调用约定,可以编写一个小存根来抓取内容并将其转发到“真正的”C入口点。

这里有一个读取argcargv的程序(x86和x86_64)的最小示例,打印出stdout上argv中的所有值(由换行符分隔)并退出使用argc作为状态代码;它可以使用通常的gcc -nostdlib(以及-static进行编译,以确保不会涉及ld.so;不会在此处造成任何伤害。

#ifdef __x86_64__
asm(
        ".global _start\n"
        "_start:\n"
        "   xorl %ebp,%ebp\n"       // mark outermost stack frame
        "   movq 0(%rsp),%rdi\n"    // get argc
        "   lea 8(%rsp),%rsi\n"     // the arguments are pushed just below, so argv = %rbp + 8
        "   call bare_main\n"       // call our bare_main
        "   movq %rax,%rdi\n"       // take the main return code and use it as first argument for...
        "   movl $60,%eax\n"        // ... the exit syscall
        "   syscall\n"
        "   int3\n");               // just in case

asm(
        "bare_write:\n"             // write syscall wrapper; the calling convention is pretty much ok as is
        "   movq $1,%rax\n"         // 1 = write syscall on x86_64
        "   syscall\n"
        "   ret\n");
#endif
#ifdef __i386__
asm(
        ".global _start\n"
        "_start:\n"
        "   xorl %ebp,%ebp\n"       // mark outermost stack frame
        "   movl 0(%esp),%edi\n"    // argc is on the top of the stack
        "   lea 4(%esp),%esi\n"     // as above, but with 4-byte pointers
        "   sub $8,%esp\n"          // the start starts 16-byte aligned, we have to push 2*4 bytes; "waste" 8 bytes
        "   pushl %esi\n"           // to keep it aligned after pushing our arguments
        "   pushl %edi\n"
        "   call bare_main\n"       // call our bare_main
        "   add $8,%esp\n"          // fix the stack after call (actually useless here)
        "   movl %eax,%ebx\n"       // take the main return code and use it as first argument for...
        "   movl $1,%eax\n"         // ... the exit syscall
        "   int $0x80\n"
        "   int3\n");               // just in case

asm(
        "bare_write:\n"             // write syscall wrapper; convert the user-mode calling convention to the syscall convention
        "   pushl %ebx\n"           // ebx is callee-preserved
        "   movl 8(%esp),%ebx\n"    // just move stuff from the stack to the correct registers
        "   movl 12(%esp),%ecx\n"
        "   movl 16(%esp),%edx\n"
        "   mov $4,%eax\n"          // 4 = write syscall on i386
        "   int $0x80\n"
        "   popl %ebx\n"            // restore ebx
        "   ret\n");                // notice: the return value is already ok in %eax
#endif

int bare_write(int fd, const void *buf, unsigned count);

unsigned my_strlen(const char *ch) {
    const char *ptr;
    for(ptr = ch; *ptr; ++ptr);
    return ptr-ch;
}

int bare_main(int argc, char *argv[]) {
    for(int i = 0; i < argc; ++i) {
        int len = my_strlen(argv[i]);
        bare_write(1, argv[i], len);
        bare_write(1, "\n", 1);
    }
    return argc;
}

请注意,这里忽略了几个细微之处 - 尤其是atexit位。有关机器特定启动状态的所有文档都是从上面链接的两个glibc文件中的注释中提取的。

答案 1 :(得分:5)

这个答案仅适用于x86-64,64位Linux ABI。所提到的所有其他操作系统和ABI将大致相似,但在为每个操作系统编写自定义_start所需的详细信息中有足够的不同。

您正在&#34; x86-64 psABI&#34;中寻找初始流程状态的规范,或者为其提供完整标题,&#34; System V应用程序二进制接口,AMD64架构处理器补充(使用LP64和ILP32编程模型)&#34;。我将重现图3.9,&#34;初始处理堆栈&#34;,这里:

  
Purpose                            Start Address                  Length
------------------------------------------------------------------------
Information block, including                                      varies
argument strings, environment
strings, auxiliary information
...
------------------------------------------------------------------------
Null auxiliary vector entry                                  1 eightbyte
Auxiliary vector entries...                            2 eightbytes each
0                                                              eightbyte
Environment pointers...                                 1 eightbyte each
0                                  8+8*argc+%rsp               eightbyte
Argument pointers...               8+%rsp                argc eightbytes
Argument count                     %rsp                        eightbyte

接着说,初始寄存器是未指定的,除了 for %rsp,当然是堆栈指针,%rdx,可能包含&#34;一个函数指针,用于注册atexit&#34;。

因此,您要查找的所有信息都已存在于内存中,但尚未根据正常的调用约定进行布局,这意味着您必须使用汇编语言编写_start。根据上述内容,_start有责任将所有内容设置为main。最小_start看起来像这样:

_start:
        xorl   %ebp, %ebp       #  mark the deepest stack frame

  # Current Linux doesn't pass an atexit function,
  # so you could leave out this part of what the ABI doc says you should do
  # You can't just keep the function pointer in a call-preserved register
  # and call it manually, even if you know the program won't call exit
  # directly, because atexit functions must be called in reverse order
  # of registration; this one, if it exists, is meant to be called last.
        testq  %rdx, %rdx       #  is there "a function pointer to
        je     skip_atexit      #  register with atexit"?

        movq   %rdx, %rdi       #  if so, do it
        call   atexit

skip_atexit:
        movq   (%rsp), %rdi           #  load argc
        leaq   8(%rsp), %rsi          #  calc argv (pointer to the array on the stack)
        leaq   8(%rsp,%rdi,8), %rdx   #  calc envp (starts after the NULL terminator for argv[])
        call   main

        movl   %eax, %edi   # pass return value of main to exit
        call   exit

        hlt                 # should never get here

(完全未经测试。)

(如果您想知道为什么没有调整来维持堆栈指针对齐,这是因为在正常的过程调用时,8(%rsp)是16字节对齐的,但是_start%rsp 1}}被调用,call本身是16字节对齐的。每个%rsp指令将_start向下移8,产生正常编译函数所期望的对齐情况。)

更彻底的environ会做更多的事情,例如清除所有其他寄存器,安排比默认情况下更大的堆栈指针对齐,调用C库自己的初始化函数,设置up PT_INTERP,初始化线程局部存储使用的状态,用辅助向量做一些有建设性的事情等。

您还应该知道,如果可执行文件中有动态链接器(_start部分),它会在 ld.so之前接收控件。除了glibc本身之外,Glibc的ld.so不能用于任何C库;如果您正在编写自己的C库,并且希望支持动态链接,则还需要编写自己的input_fn。 (是的,这是不幸的;理想情况下,动态链接器将是一个单独的开发项目,并且将指定其完整的接口。)