Question

我正在尝试在汇编中编写FizzBuzz，我一直看到分段错误。到目前为止，我已经确定它不是我的打印例程（因为我已经删除了它们的内容并且问题仍然存在）并且错误隐藏在主函数中的某处。

我在运行程序时得到了这个输出：

fizzSegmentation fault

让我相信这不是使用分裂和查找剩余部分的问题。但我错了，我两年没做过大会......

SECTION .data
global _start
    fizz: db "fizz", 4
    buzz: db "buzz", 4

SECTION .bss
    counter: resb    1

SECTION .text
_start:

    mov ax,0
    mov [counter],ax

main_loop:

    cmp ax,100          ;from 0 to 100
    je  exit            ;
    mov bl,3            ;divisor
    mov ah,0            ;here will be a remainder
    div bl              ;divide
    cmp ah,0            ;compare the remainder with 0
    je  print_fizz      ;print fizz if they equal
    mov bl,5            ;new divisor
    mov ah,0            ;do I have to do it every time?
    div bl              ;divide
    cmp ah,0            ;compare the remainder with 0
    je  print_buzz      ;print buzz if they equal
    jmp print_ax        ;print contents of ax if not
    inc ax              ;increment ax
    jmp main_loop       ;jump to label

print_ax:
    ret

print_fizz:
    ret

print_buzz:
    ret

exit:
    mov rax,1
    mov rbx,0
    int 80h
    ret

我正在编译：

yasm -f elf64 -o fizzbuzz.o fizzbuzz.asm
ld -d -o fizzbuzz fizzbuzz.o

Answer 1

这导致了分段错误：

...
    je  print_fizz      ;print fizz if they equal
...
    je  print_buzz      ;print buzz if they equal
    jmp print_ax        ;print contents of ax if not
...

print_ax:
    ret

print_fizz:
    ret

print_buzz:
    ret
...

由于您跳转到函数，ret没有返回地址并将返回任何地方。将其更改为call/ret - 对：

...
;   je  print_fizz      ;print fizz if they equal
    jne .1              ;skip if not equal
    call print_fizz
    .1:
...

;   je  print_buzz      ;print buzz if they equal
    jne .2              ;skip if not equal
    call print_buzz
    .2:

;   jmp print_ax        ;print contents of ax if not
    call print_ax
...

这将导致无限循环：

mov ax,0
mov [counter],ax

main_loop:

    cmp ax,100          ;from 0 to 100
    je  exit
    ...
    mov ah,0            ;here will be a remainder
    div bl              ;divide
    ...
    mov ah,0            ;do I have to do it every time?
    div bl              ;divide
    ...
    inc ax              ;increment ax
    jmp main_loop       ;jump to label

AX更改其值并且不适合保持循环计数器。我建议：

...
main_loop:

;   cmp ax,100          ;from 0 to 100
    cmp byte [counter], 100
...
;   inc ax              ;increment ax
    inc byte [counter]
    jmp main_loop       ;jump to label
...

Answer 2

这个答案最终比我的计划要长很多，这是一篇关于编写高效asm的教程。即如何使一个简单的问题变得复杂。

如果有人对尝试实施的代码审查感兴趣，并且有一个带有很多asm技巧的版本：

有很多小方法可以做得更好，例如将5保留在bh和3 bl。您不必总是使用div bl。 AMD64有20个单字节寄存器。（al / ah，bl / bh，cl / ch，dl / dh（无REX）和sil，dil，... r15b（需要REX））。

使用16位计数器至少浪费字节（操作数大小前缀），并且可能导致速度减慢。使用mov reg,0 is bad。尽可能将条件分支放在循环的底部。

与mov rax, 1相比，

mov eax, 1浪费了指令字节，并且标记为yasm，在汇编时并没有为您优化。（nasm本身就是这样。）设置64位寄存器然后使用int 0x80 32位兼容性ABI更加愚蠢。

首先将16位计数器存储到内存中是愚蠢的，但将其存储到一个只保留一个字节的地址会导致问题。

除了小内容之外，FizzBuzz(3,5)足够小，可以展开并完全避开某些div。使用汇编程序宏，您可以轻松地生成一个完全展开的循环，每个循环具有LCM（fizz，buzz）输出（在这种情况下为15）;足够让模式重复，所以你不需要任何条件。

您可以通过使用向下计数器来解除div和count%5==0而无需展开count%3==0。 @anatolyg's 16bit DOS code-golf FizzBuzz does that。这是一种非常常见的技术，可以每N次做一些事情。例如性能计数器事件以这种方式工作。

这是我尝试使用高效的FizzBuzz（适用于AMD64 Linux），不使用库。仅`write(2)`和`exit_group(2)`

没有编译器，所以如果你想要好的代码，你必须自己编写代码。你不能希望编译器会在循环中使用i%3做一些好的事情（无论如何都不会for most compilers）。

在编写代码时，代码发生了很多变化。像往常一样，当你发现你的第一个想法需要比你希望的更多或更慢的指示时，开始实现一种方式会给你更好的想法。

我按3（Fizz）展开以删除counter%3的所有支票。我处理了counter%5个支票，从5开始倒计时而非分组。这仍然需要相当多的逻辑，完全展开到模式重复的点（LCM（3,5））。整数到ASCII数字代码可以在函数中，也可以内联到展开的循环中，用于非常膨胀的代码。

我将所有内容保存在寄存器中（包括常量fizz\n和buzz\n）。没有加载，只存储在缓冲区中。许多寄存器在循环之外设置一次，而不是在使用前立即设置mov。这需要良好的评论来跟踪你放在哪里。

我将字符追加到每个write(2)行之后fizzbuzz\n的缓冲区。这是程序逻辑中自然发生的最长周期，意味着我们只需要在一个地方使用syscall代码。

在可能写入文件或管道的真实程序中，最好使用C stdio在这种情况下使用更大缓冲区的策略。（许多~100字节写入比4096B写入少得多。）尽管如此，我认为这是传统printf每次迭代或将整个字符串累积到一个大缓冲区之间的一个有趣选择。我使用静态缓冲区而不是保留堆栈空间，因为我编写了一个完整的程序，而不是一个应该避免在返回后浪费内存的函数。另外，它允许我使用32位操作数大小作为指针增量，以节省代码字节（REX前缀）。

累积多个块非常容易，直到你到达下一组可能超过缓冲区末尾的点。即将当前位置与buffer_end - BUZZMOD*FIZZMOD*9进行比较。优化I / O系统调用显然是一个广泛的主题，这个版本足以证明在缓冲区中累积字符串。

;  for (count=1..100):
;  if(count%3 == 0) { print_fizz(); }
;  if(count%5 == 0) { print_buzz(); } else {
;       if(count%3 && count%5) print(count);
;; }
;  print(newline)

; We don't need pointers to these strings at all;  The strings are immediate data for a couple mov instructions
;SECTION .rodata        ; put constants in .rodata.
;    fizz: db "fizz"    ; No idea what the trailing  4  was for
;    buzz: db "buzz"

FIZZMOD equ 3                   ; only 3 works, but it would be easy to use a loop
BUZZMOD equ 5                   ; any value works
LASTCOUNT equ 100    ; max 100: we only handle two decimal digits.
; TODO: cleanup that can handle LASTCOUNT%FIZZMOD != 1 and LASTCOUNT%BUZZMOD != 0


SECTION .bss
;;; generate a string in this buffer.  (flush it with write(2) on "fizzbuzz" lines)
;    buf: resb    4096
buf: resb    FIZZMOD * BUZZMOD * 9     ; (worst case: every line is "fizzbuzz\n")

SECTION .text
global _start
_start:

    ; args for write(2).  (syscall clobbers rcx/r11,  and rax with the return value)
    mov   edi, 1                ; STDOUT_FILENO.  also happens to be __NR_write in the AMD64 Linux ABI
    mov   esi, buf              ; static data lives in the low 2G of address space, so we don't need a 64bit mov
    ;; edx = count.             ; calculated each iteration
    ;; mov eax, edi             ; also needed every time.   saves 3B vs  mov eax, imm32

    ; 'fizz' is only used once, so we could just store with an immediate there.  That wouldn't micro-fuse, and we'd have to do the newline separately
    mov   r10b, 10      ; base 10
    ;;mov   r14d, BUZZMOD  ; not needed, we don't div for this
    mov   r12, 'fizz' | 10<<32      ; `fizz\n`, but YASM doesn't support NASM's backquotes for \-escapes
    mov   r13, 'buzz' | 10<<32      ; `buzz\n`.  When buzz appears, it's always the end of a line


;;;;;;;; Set up for first iteration
    mov   ebp, BUZZMOD          ; detect count%BUZZMOD == 0 with a down-counter instead of dividing
    mov   ebx, 1                ; counter starts at 1
    mov   edx, esi              ; current output position = front of buf
ALIGN 16
main_loop:

    ;; TODO: loop FIZZMOD-1 times inside buzz_or_number, or here
    ;; It doesn't make much sense to unroll this loop but not inline buzz_or_number :/
    call  buzz_or_number
    inc   ebx

    call  buzz_or_number
    add   ebx, 2                ; counter is never printed on Fizz iterations, so just set up for next main_loop

    ;; Fizz, and maybe also Buzz
    mov   qword [rdx], r12      ; Fizz with a newline
    add   edx, 5                ; TODO: move this after the branch; adjust the offsets in .fizzbuzz

    dec   ebp
    jz   .fizzbuzz

;;.done_buzz:   ; .fizzbuzz duplicates the main_loop branch instead of jumping back here
    cmp   ebx, LASTCOUNT-FIZZMOD
    jbe   main_loop
;;;;;;;;;; END OF main_loop


.cleanup:
;;;;;;;;;;;;;;;;;;;;;  Cleanup after the loop
    ; hard-code the fact that 100 % FIZZMOD = 1 more line to print,
    ; and that 100 % BUZZMOD = 0, so the line is "buzz\n"

    mov   eax, edi              ; __NR_write
    mov   [rdx], r13            ; the final "buzz\n".
    sub   edx, buf - 5          ; write_count = current_pos+5 - buf.
    syscall                     ; write(1, buf, p - buf).
    ;; if buf isn't static, then use  add   edx, 5 / sub   edx, esi

    xor edi, edi
    mov eax, 231    ;  exit_group(0).  same as eax=60: exit() for a single-threaded program
    syscall


;;;;; The fizzbuzz case from the loop
.fizzbuzz:
;; count%BUZZMOD == 0:   rdx points after the \n at the end of fizz\n, which we need to overwrite

;; this is a macro so we can use it in buzz_or_number, too, where we don't need to back up and overwrite a \n
%macro  BUZZ_HIT 1
    mov   [rdx - %1], r13       ; buzz\n.  Next line will overwrite the last 3 bytes of the 64b store.
    add   edx, 5 - %1
    mov   ebp, BUZZMOD          ; reset the count%BUZZMOD down-counter
%endmacro

    BUZZ_HIT 1                  ; arg=1 to back up and overwrite the \n from "fizz\n"

    sub   edx, esi              ; write_count = current_pos - buf
    mov   eax, edi              ; __NR_write
    syscall                     ; write(1, buf, p - buf).  clobbers only rax (return value), and rcx,r11
    mov   edx, esi              ; restart at the front of the buffer

;;; tail-duplication of the main loop, instead of jmp back to the cmp/jbe
;;; could just be a jmp main_loop, if we check at assemble time that  LASTCOUNT % FIZZMOD != 0 || LASTCOUNT % BUZZMOD != 0
    cmp   ebx, LASTCOUNT-FIZZMOD
    jbe   main_loop
    jmp   .cleanup

;;;;;;;;;;;;;;;;;;;;;;; buzz_or_number: called for non-fizz cases
; special calling convention: uses (without clobbering) the same regs as the loop
;; modifies: BUZZMOD down-counter, output position pointer
;; clobbers: rax, rcx
ALIGN 32
buzz_or_number:
    dec   ebp
    jnz  .no_buzz              ; could make this part of the macro, but flow-control inside macros is probably worse than duplication

;; count%BUZZMOD == 0:  append "buzz\n" to the buffer and reset the down-counter
    BUZZ_HIT  0                 ; back up 0 bytes before appending
    ret

.no_buzz:             ;; get count as a 1 or 2-digit ASCII number
    ;; assert(ebx < 10);   We don't handle 3-digit numbers

    mov   eax, ebx
    div   r10b                  ; al = count/10 (first (high) decimal digit), ah = count%10 (second (low) decimal digit).
    ;; x86 is little-endian, so this is in printing-order already for storing eax

    ;movzx eax, ax            ; avoid partial-reg stalls on pre-Haswell
    ;; convert integer digits to ASCII by adding '0' to al and ah at the same time, and set the 3rd byte to `\n`.
    cmp   ebx, 9                ; compare against the original counter instead of the div result, for more ILP and earlier detection of branch misprediction
    jbe   .1digit               ; most numbers from 1..100 are 2-digit, so make this the not-taken case
    add   eax, 0x0a3030   ;;  `00\n`: converts 2 integer digits -> ASCII
    ;; eax now holds the number + newline as a 3-byte ASCII string
    mov   [rdx], eax
    add   edx, 3
    ret

.1digit:
;; Could use a 16bit operand-size here to avoid partial-reg stalls, but an imm16 would LCP-stall on Intel.
    shr   eax, 8                ; Shift out the leading 0
    add   eax, 0x000a30   ;; 1-digit numbers
    ;; eax now holds the number + newline as a 2-byte ASCII string
    mov   [rdx], ax
    add   edx, 2
    ret

这就是它的运行方式：

$ strace ./fizzbuzz > /dev/null
execve("./fizzbuzz", ["./fizzbuzz"], [/* 69 vars */]) = 0
write(1, "1\n2\nfizz\n4\nbuzz\nfizz\n7\n8\nfizz\nbu"..., 58) = 58
write(1, "16\n17\nfizz\n19\nbuzz\nfizz\n22\n23\nfi"..., 63) = 63
write(1, "31\n32\nfizz\n34\nbuzz\nfizz\n37\n38\nfi"..., 63) = 63
write(1, "46\n47\nfizz\n49\nbuzz\nfizz\n52\n53\nfi"..., 63) = 63
write(1, "61\n62\nfizz\n64\nbuzz\nfizz\n67\n68\nfi"..., 63) = 63
write(1, "76\n77\nfizz\n79\nbuzz\nfizz\n82\n83\nfi"..., 63) = 63
write(1, "91\n92\nfizz\n94\nbuzz\nfizz\n97\n98\nfi"..., 40) = 40
exit_group(0)                           = ?

Correctness check：

./fizzbuzz | diff - <(perl -E'say((fizz)[$_%3].(buzz)[$_%5]or$_)for+1..100')
# no output = no difference

展开Buzz（5）并使用Fizz的向下计数器可能会更糟。我的版本有一个64位商店fizz\n\0\0\0，然后是一个分支来决定是否存储buzz\n\0\0\0重叠以生成fizzbuzz\n。另一种方式是有一个分支来决定是否存储fizz（不需要换行，因此它可以是32位存储）。然后它会无条件地存储buzz\n\0\0\0。但是，由于FIZZMOD小于BUZZMOD，这意味着更频繁地重置向下计数器，并且更多检查是否需要在此次迭代中打印数字而不是字符串。将每个第三行称为fizz\n或fizzbuzz\n意味着更简单的代码运行更频繁。

如果重叠商店存在问题，整个算法就会被搞砸，这只是其中之一。此外，我们可以在存储fizz\n并添加5之前进行分支。然后在fizzbuzz\n情况下，我们执行两个存储并添加9.这也将dec / jcc与底部的cmp / jcc分开main_loop的{{1}}，所以他们可以在Haswell之前进行宏观融合。 IIRC，一些CPU有分支预测器，真的不喜欢多个分支彼此非常接近。

进一步改进，留给读者练习：

内联buzz_or_number，可能会将其转换为循环（FIZZMOD-1迭代）
除此之外，它可能会减少分支，以及其他一些小的改进。这是一个版本1.1：工作，测试，在写这个答案时添加了一些评论和观察，但实际上并没有从我最初的决定很好地改进代码，看看它是否有效。
< / LI>
通过为最后LASTCOUNT % FIZZMOD行编写清理循环（或汇编程序宏）使其更灵活，而不是假设它是1行。清理代码是展开的缺点。
我used a div by 10将计数器转换为字符串。更好的实现将使用乘法逆，like compilers generate for small constant divisors (implemented in this case with LEA)。

另一个值得考虑的策略是增加一个ASCII数字序列（存储在寄存器中）的强度减少。这种技术更难扩展到具有更多数字的数字。将它们按打印顺序存储（低字节中的最高有效数字）使得数字之间的进位对我们起作用而不是对我们起作用。（例如，如果它们处于自然顺序，你可以add eax, 256-10来纠正低位数并通过进位增加高位数。）保持这种方式可能是值得的，但BSWAP要存储。将\n嵌入寄存器中以便只需要一个商店可能不值得。检测和处理1位数字成为2位数字是不够的。

在32位模式下，我们可以在递增后使用AAA instruction进行十进制进位。但是，尽管存在助记符，但它适用于BCD（0-9），而不是ASCII（'0'-'9'），并且似乎不容易将进位传播到第3位。难怪AMD为AMD64删除了它。它检查AF标志以检测低4位的执行，但这只对DAA有用，你有两个BCD数字打包成一个字节，当你＆＃ 39;重新添加未知值，而不是递增。在这种情况下，您只需检查al >= 10。

我的第一个版本几乎是第一次工作（修复了几个语法错误，因此它会组装，并且需要花费几分钟来调试IIRC的愚蠢崩溃）：它在{{fizz\nbuzz\n中打印1}} case，它颠倒了数字。 I keep forgetting数字字符串首先需要与最高有效数字一起存储，而不是像小端二进制整数中的字节一样存储。

替代方法

我决定不使用1位数字与2位数字ASCII转换代码的无分支版本，因为它需要大量指令。此外，分支机构应该做好预测。

fizzbuzz\n

在32位（和16位）模式下， a ;; Untested buzz_or_number: ... .no_buzz: ... div r10b DECIMAL_TO_ASCII_NEWLINE_2DIGIT equ 0x0a3030 ; add '0' to two unpacked decimal digits, and a newline DECIMAL_TO_ASCII_NEWLINE_1DIGIT equ 0x000a30 ;; hoist this out of the loop: mov r15d, DECIMAL_TO_ASCII_NEWLINE_2DIGIT - DECIMAL_TO_ASCII_NEWLINE_1DIGIT xor ecx,ecx cmp ah, 1 ; set CF if ah=0 (1 digit number), otherwise clear it. This allows sbb for a conditional add, instead of setcc cmovae ecx, r15d ; 0 or the difference from 1digit to 2digit lea eax, [rax+rcx + DECIMAL_TO_ASCII_NEWLINE_1DIGIT] ; rax+=0x0a3030 or 0x000a30, without clobbering flags mov [rdx], eax sbb edx, -3 ; add 2 (-(-3) - 1) or 3. ret指令采用立即操作数，并使用div作为被除数，而不是{{ 1}}。它的称为AL ，并且已针对AMD64以及其他BCD / ASCII指令删除。在不占用除数的寄存器或在循环内浪费指令的情况下，可以方便地将除数率测试为5。它比AX略快，并根据余数设置标记（在AAM中：与div r/m8相比，它的输出反转。）

Anatolyg's golfed FizzBuzz在al的循环中使用AAM以相反的顺序一次生成一个数字，存储和递减指针。

此版本更复杂，因为它使用div检查计数％5然后将其处理为计数％10，而不是单独划分得到ASCII数字。

shr ax, 8

Answer 3

使用调试器单步执行代码并查看其出错的位置。

从快速浏览一下，很明显你正在摧毁ax（也许你不知道ax由ah和al组成？）。你也跳到函数而不是调用它们，这可能是导致错误的原因。

FizzBuzz中的汇编 - 分段错误

3 个答案:

这是我尝试使用高效的FizzBuzz（适用于AMD64 Linux），不使用库。仅`write(2)`和`exit_group(2)`

进一步改进，留给读者练习：

替代方法

FizzBu​​zz中的汇编 - 分段错误

3 个答案:

这是我尝试使用高效的FizzBu​​zz（适用于AMD64 Linux），不使用库。仅write(2)和exit_group(2)

进一步改进，留给读者练习：

替代方法

FizzBuzz中的汇编 - 分段错误

这是我尝试使用高效的FizzBuzz（适用于AMD64 Linux），不使用库。仅`write(2)`和`exit_group(2)`