我试图找出Go的循环性能是否与C一样好,但令人惊讶地发现,对于我的简单测试,C版本需要两倍于Go版本的时间。
C版:
#include <stdio.h>
int main() {
int i = 0, a = 0;
while (i < 1e9) {
a = (a + i) % 42;
i = i + 1;
}
printf("%d\n", a);
}
,
$ gcc -o main main.c && time ./main # tried -O0 as well; the result is similar
36
./main 10.53s user 0.08s system 98% cpu 10.769 total
转到版本:
package main
import "fmt"
func main() {
a := int32(0)
for i := int32(0); i < 1e9; i++ {
a = (a + i) % 42
}
fmt.Println(a)
}
,
$ time go run main.go
36
colorgo run main.go 5.27s user 0.14s system 93% cpu 5.816 total
(在Darwin上测试,amd64)
对于这个简单的算法,它们不应该产生几乎相同的机器代码吗?这是由于编译器优化吗?缓存效率?
请帮我理解!谢谢!
答案 0 :(得分:3)
这一切归结为生成的组件。
go tool 6g -S(21条说明):
MOVL $0,SI
MOVL SI,"".a+8(FP)
MOVL $0,CX
CMPL CX,$1000000000
JGE $0,58
ADDL CX,SI
MOVL $818089009,BP
MOVL SI,AX
IMULL BP,
MOVL DX,BX
SARL $3,BX
MOVL SI,BP
SARL $31,BP
SUBL BP,BX
IMULL $42,BX
SUBL BX,SI
MOVL SI,"".a+8(FP)
INCL ,CX #point A
NOP ,
CMPL CX,$1000000000
JLT $0,16
RET ,
gcc -O3 -march = native -S(17条指令):
leal (%rsi,%rcx), %edi
addl $1, %ecx
vxorpd %xmm0, %xmm0, %xmm0
vcvtsi2sd %ecx, %xmm0, %xmm0
movl %edi, %eax
imull %r8d
movl %edi, %eax
sarl $31, %eax
sarl $3, %edx
movl %edx, %esi
subl %eax, %esi
imull $42, %esi, %esi
subl %esi, %edi
vucomisd %xmm0, %xmm1
movl %edi, %esi
ja .L2
subq $8, %rsp
gcc -O3 -march = native -S(14条指令,在用1000000000替换1e9之后):
leal (%rdx,%rcx), %esi
addl $1, %ecx
movl %esi, %eax
imull %edi
movl %esi, %eax
sarl $31, %eax
sarl $3, %edx
subl %eax, %edx
imull $42, %edx, %edx
subl %edx, %esi
movl %esi, %edx
cmpl $1000000000, %ecx
jne .L2
subq $8, %rsp
时间安排:
$ gcc -O3 -march=native loop.c; and time ./a.out
36
2.92user 0.00system 0:02.93elapsed 99%CPU
$ go build -o loop loop.go; and time ./loop
36
2.89user 0.00system 0:02.90elapsed 99%CPU
$ gcc -O3 -march=native loop_nofp.c; and time ./a.out
36
2.92user 0.00system 0:02.94elapsed 99%CPU (0avgtext+0avgdata 1312maxresident)
我不知道,我现在暂时离开这个,直到找到合适的答案。
//修改
更改用于匹配Go版本的C代码产生了不同的程序集,但时间完全相同。
int main() {
int32_t i = 0, a = 0;
for (i = 0; i < 1e9; i++) {
a = (a + i) % 42;
}
printf("%d\n", a);
return 0;
}
答案 1 :(得分:1)
它们在优化时几乎相同。例如,
转到:
$ cat t.go
package main
import "fmt"
func main() {
a := int32(0)
for i := int32(0); i < 1e9; i++ {
a = (a + i) % 42
}
fmt.Println(a)
}
$ go version
go version devel +e1a081e6ddf8 Sat Sep 27 11:56:54 2014 -0700 linux/amd64
$ go build t.go && time ./t
36
real 0m15.809s
user 0m15.815s
sys 0m0.061s
C:
$ cat t.c
#include <stdio.h>
int main() {
int i = 0, a = 0;
while (i < 1e9) {
a = (a + i) % 42;
i = i + 1;
}
printf("%d\n", a);
}
$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
$ gcc -O3 t.c && time ./a.out
36
real 0m16.538s
user 0m16.528s
sys 0m0.021s