如何使用没有Glibc的C中的内联汇编来获取参数值?
我需要Linux
archecture x86_64
和i386
的此代码。
如果您了解MAC OS X
或Windows
,请同时提交并请指导。
void exit(int code)
{
//This function not important!
//...
}
void _start()
{
//How Get arguments value using inline assembly
//in C without Glibc?
//argc
//argv
exit(0);
}
https://gist.github.com/apsun/deccca33244471c1849d29cc6bb5c78e
和
#define ReadRdi(To) asm("movq %%rdi,%0" : "=r"(To));
#define ReadRsi(To) asm("movq %%rsi,%0" : "=r"(To));
long argcL;
long argvL;
ReadRdi(argcL);
ReadRsi(argvL);
int argc = (int) argcL;
//char **argv = (char **) argvL;
exit(argc);
但它仍然返回0。 所以这段代码错了! 请帮忙。
答案 0 :(得分:8)
如注释中所指定的,堆栈上提供了argc
和argv
,因此即使使用内联汇编,也无法使用常规C函数来获取它们,因为编译器将触及堆栈用于分配局部变量的指针,设置堆栈帧和放大器。合作;因此,_start
必须用汇编语言编写,因为它是在glibc(x86; x86_64)中完成的。根据常规调用约定,可以编写一个小存根来抓取内容并将其转发到“真正的”C入口点。
这里有一个读取argc
和argv
的程序(x86和x86_64)的最小示例,打印出stdout上argv
中的所有值(由换行符分隔)并退出使用argc
作为状态代码;它可以使用通常的gcc -nostdlib
(以及-static
进行编译,以确保不会涉及ld.so
;不会在此处造成任何伤害。
#ifdef __x86_64__
asm(
".global _start\n"
"_start:\n"
" xorl %ebp,%ebp\n" // mark outermost stack frame
" movq 0(%rsp),%rdi\n" // get argc
" lea 8(%rsp),%rsi\n" // the arguments are pushed just below, so argv = %rbp + 8
" call bare_main\n" // call our bare_main
" movq %rax,%rdi\n" // take the main return code and use it as first argument for...
" movl $60,%eax\n" // ... the exit syscall
" syscall\n"
" int3\n"); // just in case
asm(
"bare_write:\n" // write syscall wrapper; the calling convention is pretty much ok as is
" movq $1,%rax\n" // 1 = write syscall on x86_64
" syscall\n"
" ret\n");
#endif
#ifdef __i386__
asm(
".global _start\n"
"_start:\n"
" xorl %ebp,%ebp\n" // mark outermost stack frame
" movl 0(%esp),%edi\n" // argc is on the top of the stack
" lea 4(%esp),%esi\n" // as above, but with 4-byte pointers
" sub $8,%esp\n" // the start starts 16-byte aligned, we have to push 2*4 bytes; "waste" 8 bytes
" pushl %esi\n" // to keep it aligned after pushing our arguments
" pushl %edi\n"
" call bare_main\n" // call our bare_main
" add $8,%esp\n" // fix the stack after call (actually useless here)
" movl %eax,%ebx\n" // take the main return code and use it as first argument for...
" movl $1,%eax\n" // ... the exit syscall
" int $0x80\n"
" int3\n"); // just in case
asm(
"bare_write:\n" // write syscall wrapper; convert the user-mode calling convention to the syscall convention
" pushl %ebx\n" // ebx is callee-preserved
" movl 8(%esp),%ebx\n" // just move stuff from the stack to the correct registers
" movl 12(%esp),%ecx\n"
" movl 16(%esp),%edx\n"
" mov $4,%eax\n" // 4 = write syscall on i386
" int $0x80\n"
" popl %ebx\n" // restore ebx
" ret\n"); // notice: the return value is already ok in %eax
#endif
int bare_write(int fd, const void *buf, unsigned count);
unsigned my_strlen(const char *ch) {
const char *ptr;
for(ptr = ch; *ptr; ++ptr);
return ptr-ch;
}
int bare_main(int argc, char *argv[]) {
for(int i = 0; i < argc; ++i) {
int len = my_strlen(argv[i]);
bare_write(1, argv[i], len);
bare_write(1, "\n", 1);
}
return argc;
}
请注意,这里忽略了几个细微之处 - 尤其是atexit
位。有关机器特定启动状态的所有文档都是从上面链接的两个glibc文件中的注释中提取的。
答案 1 :(得分:5)
这个答案仅适用于x86-64,64位Linux ABI。所提到的所有其他操作系统和ABI将大致相似,但在为每个操作系统编写自定义_start
所需的详细信息中有足够的不同。
您正在&#34; x86-64 psABI&#34;中寻找初始流程状态的规范,或者为其提供完整标题,&#34; System V应用程序二进制接口,AMD64架构处理器补充(使用LP64和ILP32编程模型)&#34;。我将重现图3.9,&#34;初始处理堆栈&#34;,这里:
Purpose Start Address Length ------------------------------------------------------------------------ Information block, including varies argument strings, environment strings, auxiliary information ... ------------------------------------------------------------------------ Null auxiliary vector entry 1 eightbyte Auxiliary vector entries... 2 eightbytes each 0 eightbyte Environment pointers... 1 eightbyte each 0 8+8*argc+%rsp eightbyte Argument pointers... 8+%rsp argc eightbytes Argument count %rsp eightbyte
接着说,初始寄存器是未指定的,除了
for %rsp
,当然是堆栈指针,%rdx
,可能包含&#34;一个函数指针,用于注册atexit&#34;。
因此,您要查找的所有信息都已存在于内存中,但尚未根据正常的调用约定进行布局,这意味着您必须使用汇编语言编写_start
。根据上述内容,_start
有责任将所有内容设置为main
。最小_start
看起来像这样:
_start:
xorl %ebp, %ebp # mark the deepest stack frame
# Current Linux doesn't pass an atexit function,
# so you could leave out this part of what the ABI doc says you should do
# You can't just keep the function pointer in a call-preserved register
# and call it manually, even if you know the program won't call exit
# directly, because atexit functions must be called in reverse order
# of registration; this one, if it exists, is meant to be called last.
testq %rdx, %rdx # is there "a function pointer to
je skip_atexit # register with atexit"?
movq %rdx, %rdi # if so, do it
call atexit
skip_atexit:
movq (%rsp), %rdi # load argc
leaq 8(%rsp), %rsi # calc argv (pointer to the array on the stack)
leaq 8(%rsp,%rdi,8), %rdx # calc envp (starts after the NULL terminator for argv[])
call main
movl %eax, %edi # pass return value of main to exit
call exit
hlt # should never get here
(完全未经测试。)
(如果您想知道为什么没有调整来维持堆栈指针对齐,这是因为在正常的过程调用时,8(%rsp)
是16字节对齐的,但是_start
时%rsp
1}}被调用,call
本身是16字节对齐的。每个%rsp
指令将_start
向下移8,产生正常编译函数所期望的对齐情况。)
更彻底的environ
会做更多的事情,例如清除所有其他寄存器,安排比默认情况下更大的堆栈指针对齐,调用C库自己的初始化函数,设置up PT_INTERP
,初始化线程局部存储使用的状态,用辅助向量做一些有建设性的事情等。
您还应该知道,如果可执行文件中有动态链接器(_start
部分),它会在 ld.so
之前接收控件。除了glibc本身之外,Glibc的ld.so
不能用于任何C库;如果您正在编写自己的C库,并且希望支持动态链接,则还需要编写自己的input_fn
。 (是的,这是不幸的;理想情况下,动态链接器将是一个单独的开发项目,并且将指定其完整的接口。)