我有这个我正在为学校工作的程序,其目的是添加两个矩阵并将结果存储在第三个矩阵中。目前,当使用驱动程序(一个.o文件)运行时,指令数为1,003,034,420,但它需要不到10亿。但是,我不知道该如何做到这一点,因为我已经考虑了我使用的所有指令,并且所有这些指令似乎都是使程序工作所必需的。
请注意,此时我无法减少循环展开的指令数量。
以下是该计划:
/* This function has 5 parameters, and the declaration in the
C-language would look like:
void matadd (int **C, int **A, int **B, int height, int width)
C, A, B, and height will be passed in r0-r3, respectively, and
width will be passed on the stack. */
.arch armv7-a
.text
.align 2
.global matadd
.syntax unified
.arm
matadd:
push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
ldr r4, [sp, #36] @ load width into r4
mov r5, #0 @ r5 is current row index
row_loop:
mov r6, #0 @ r6 is the col, reset it for each new row
cmp r5, r3 @ compare row with height
beq end_loops @ we have finished all of the rows
ldr r11, [r0, r5, lsl #2] @ r11 is the current row array of C
ldr r7, [r1, r5, lsl #2] @ r7 is the current row array of A
ldr r8, [r2, r5, lsl #2] @ r8 is the current row array of B
@ the left shifts are so that we skip
@ 4 bytes since these are ints
@ these do not change registers
col_loop:
cmp r6, r4 @ compare col with width
beq end_col @ we have finished this col
ldr r9, [r7, r6, lsl #2] @ r9 is cur_row[col] of A
ldr r10, [r8, r6, lsl #2] @ r10 is cur_row[col] of B
add r9, r9, r10 @ r8 is A[row][col] + B[row][col]
str r9, [r11, r6, lsl #2] @ store result of addition in C[row][col]
add r6, r6, #1 @ increment col
b col_loop @ get next entry
end_col:
add r5, r5, #1 @ increment row
b row_loop @ get next row
end_loops:
pop {r4, r5, r6, r7, r8, r9, r10, r11, pc}
我认为必须有一些指令来组合cmp和b或其他东西,但我似乎无法找到它。关于如何减少指令数量的任何指示?
答案 0 :(得分:3)
您想要从内循环中删除无条件分支。
loop_start:
cmp x, y
beq loop_exit
blah blah blah
b loop_start
loop_exit:
请注意,每次循环时,您都有一个无条件分支(b loop_start
)。通过内联分支目标直到下一个条件分支来避免分支。
loop_start:
cmp x, y
beq loop_exit
loop_middle:
blah blah blah
; was "b loop_start" but we just copy the instructions
; starting at "loop_start" up to the conditional branch
cmp x, y
beq loop_exit
; and then jump to the instruction after the inlined portion
b loop_middle
loop_exit:
此时,beq
只是分支上的一个分支,因此可以用反向分支替换它。
loop_start:
cmp x, y
beq loop_exit
loop_middle:
blah blah blah
cmp x, y
; "beq loop_exit" followed by "b loop_middle" is equivalent to this
bne loop_middle
loop_exit:
您的代码中有两种机会进行优化。
(在提交解决方案时不要忘记引用此网页,以避免学术不诚实的指控。)