带边界检查的F#易碎数组迭代消除了?

时间:2018-11-10 17:28:55

标签: arrays performance f# break cil

我有兴趣在高性能应用程序中尝试F#。我不想在迭代期间检查大型数组的边界,而缺少break / return语句令人担忧。

这是一个人为设计的示例,会在找到值时中断,但是有人可以告诉我是否也取消了边界检查吗?

let innerExists (item: Char) (items: Char array): bool = 
    let mutable state = false
    let mutable i = 0
    while not state && i < items.Length do
        state <- item = items.[i]
        i <- i + 1

    state

let exists (input: Char array)(illegalChars: Char array): bool = 
    let mutable state = false
    let mutable i = 0
    while not state && i < input.Length do
        state <- innerExists input.[i] illegalChars
        i <- i + 1

    state

exists [|'A'..'z'|] [|'.';',';';'|]

这是相关的反汇编:

    while not state && i < input.Length do
000007FE6EB4237A  cmp         dword ptr [rbp-14h],0  
000007FE6EB4237E  jne         000007FE6EB42383  
000007FE6EB42380  nop  
000007FE6EB42381  jmp         000007FE6EB42386  
000007FE6EB42383  nop  
000007FE6EB42384  jmp         000007FE6EB423A9  
000007FE6EB42386  nop  
000007FE6EB42387  mov         r8d,dword ptr [rbp-18h]  
000007FE6EB4238B  mov         rdx,qword ptr [rbp+18h]  
000007FE6EB4238F  cmp         r8d,dword ptr [rdx+8]  
000007FE6EB42393  setl        r8b  
000007FE6EB42397  movzx       r8d,r8b  
000007FE6EB4239B  mov         dword ptr [rbp-24h],r8d  
000007FE6EB4239F  mov         r8d,dword ptr [rbp-24h]  
000007FE6EB423A3  mov         dword ptr [rbp-1Ch],r8d  
000007FE6EB423A7  jmp         000007FE6EB423B1  
000007FE6EB423A9  nop  
000007FE6EB423AA  xor         r8d,r8d  
000007FE6EB423AD  mov         dword ptr [rbp-1Ch],r8d  
000007FE6EB423B1  cmp         dword ptr [rbp-1Ch],0  
000007FE6EB423B5  je          000007FE6EB42409  
            state <- innerExists input.[i] illegalChars
000007FE6EB423B7  mov         r8d,dword ptr [rbp-18h]  
000007FE6EB423BB  mov         rdx,qword ptr [rbp+18h]  
000007FE6EB423BF  cmp         r8,qword ptr [rdx+8]  
000007FE6EB423C3  jb          000007FE6EB423CA  
000007FE6EB423C5  call        000007FECD796850  
000007FE6EB423CA  lea         rdx,[rdx+r8*2+10h]  
000007FE6EB423CF  movzx       r8d,word ptr [rdx]  
000007FE6EB423D3  mov         rdx,qword ptr [rbp+10h]  
000007FE6EB423D7  mov         rdx,qword ptr [rdx+8]  
000007FE6EB423DB  mov         r9,qword ptr [rbp+20h]  
000007FE6EB423DF  mov         rcx,7FE6EEE0640h  
000007FE6EB423E9  call        000007FE6EB41E40  
000007FE6EB423EE  mov         dword ptr [rbp-20h],eax  
000007FE6EB423F1  mov         eax,dword ptr [rbp-20h]  
000007FE6EB423F4  movzx       eax,al  
000007FE6EB423F7  mov         dword ptr [rbp-14h],eax  
            i <- i + 1
000007FE6EB423FA  mov         eax,dword ptr [rbp-18h]  

3 个答案:

答案 0 :(得分:3)

JIT编译器取消了边界检查,因此F#和C#的工作原理相同。您可以像示例中那样消除代码,也可以消除

for i = 0 to data.Lenght - 1 do
    ...

以及尾部递归函数,这些函数向下编译为循环。

编写内置的Array.contains和Array.exists (source code),以便JIT编译器可以消除边界检查。

答案 1 :(得分:3)

其他人指出,使用现有功能FSharp.Core可以达到相同的结果,但是我认为OP会询问是否避免像数组边界检查这样的循环中的循环(因为在循环条件中进行了检查)。

对于像上面这样的简单代码,抖动应该能够消除检查。看到这一点,检查汇编代码是正确的,但重要的是不要在附加了VS调试器的情况下运行,因为抖动不会优化代码。无法在调试器中显示正确值的原因。

首先让我们看一下exists优化的x64:

; not state?
00007ff9`1cd37551 85c0            test    eax,eax
; if state is true then exit the loop
00007ff9`1cd37553 7521            jne     00007ff9`1cd37576
; i < input.Length?
00007ff9`1cd37555 395e08          cmp     dword ptr [rsi+8],ebx
; Seems overly complex but perhaps this is as good as it gets?
00007ff9`1cd37558 0f9fc1          setg    cl
00007ff9`1cd3755b 0fb6c9          movzx   ecx,cl
00007ff9`1cd3755e 85c9            test    ecx,ecx
; if we have reached end of the array then exit
00007ff9`1cd37560 7414            je      00007ff9`1cd37576
; mov i in ebx to rcx, unnecessary but moves like these are very cheap
00007ff9`1cd37562 4863cb          movsxd  rcx,ebx
; input.[i] (note we don't check the boundary again)
00007ff9`1cd37565 0fb74c4e10      movzx   ecx,word ptr [rsi+rcx*2+10h]
; move illegalChars pointer to rdx
00007ff9`1cd3756a 488bd7          mov     rdx,rdi
; call innerExists
00007ff9`1cd3756d e8ee9affff      call    00007ff9`1cd31060
; i <- i + 1
00007ff9`1cd37572 ffc3            inc     ebx
; Jump top of loop
00007ff9`1cd37574 ebdb            jmp     00007ff9`1cd37551
; We are done!
00007ff9`1cd37576

因此,对于所需的代码,它看起来有点太复杂了,但似乎只检查了一次数组条件。

现在让我们看一下innerExists优化的x64:

# let mutable state = false
00007ff9`1cd375a0 33c0            xor     eax,eax
# let mutable i = 0
00007ff9`1cd375a2 4533c0          xor     r8d,r8d
; not state?
00007ff9`1cd375a5 85c0            test    eax,eax
; if state is true then exit the loop
00007ff9`1cd375a7 752b            jne     00007ff9`1cd375d4
; i < items.Length
00007ff9`1cd375a9 44394208        cmp     dword ptr [rdx+8],r8d
; Seems overly complex but perhaps this is as good as it gets?
00007ff9`1cd375ad 410f9fc1        setg    r9b
00007ff9`1cd375b1 450fb6c9        movzx   r9d,r9b
00007ff9`1cd375b5 4585c9          test    r9d,r9d
; if we have reached end of the array then exit
00007ff9`1cd375b8 741a            je      00007ff9`1cd375d4
; mov i in r8d to rax, unnecessary but moves like these are very cheap
00007ff9`1cd375ba 4963c0          movsxd  rax,r8d
; items.[i] (note we don't check the boundary again)
00007ff9`1cd375bd 0fb7444210      movzx   eax,word ptr [rdx+rax*2+10h]
; mov item in cx to r9d, unnecessary but moves like these are very cheap
00007ff9`1cd375c2 440fb7c9        movzx   r9d,cx
; item = items.[i]?
00007ff9`1cd375c6 413bc1          cmp     eax,r9d
00007ff9`1cd375c9 0f94c0          sete    al
; state <- ?
00007ff9`1cd375cc 0fb6c0          movzx   eax,al
; i <- i + 1
00007ff9`1cd375cf 41ffc0          inc     r8d
; Jump top of loop
00007ff9`1cd375d2 ebd1            jmp     00007ff9`1cd375a5
; We are done!
00007ff9`1cd375d4 c3              ret

所以看起来应该太复杂了,但至少看起来只检查了一次数组条件。

最后,看起来抖动消除了数组边界检查,因为它可以证明已经在循环条件下成功检查了这一点,我相信这是OP所想的。

x64代码看起来不尽如人意,但根据我的实验,清理后的x64代码的性能并没有那么好,我怀疑CPU供应商会针对产生的垃圾代码优化CPU。

>

一个有趣的练习是用C ++编写等效程序并通过https://godbolt.org/运行,选择x86-64 gcc (trunk)(gcc似乎目前效果最好)并指定选项-O3 -march=native并查看生成的x64代码。

更新

https://godbolt.org/重写的代码使我们能够看到c ++编译器生成的汇编代码:

template<int N>
bool innerExists(char item, char const (&items)[N]) {
    for (auto i = 0; i < N; ++i) {
        if (item == items[i]) return true;
    }
    return false;
}

template<int N1, int N2>
bool exists(char const (&input)[N1], char const (&illegalCharacters)[N2]) {
    for (auto i = 0; i < N1; ++i) {
        if (innerExists(input[i], illegalCharacters)) return true;
    }
    return false;
}

char const separators[] = { '.', ',', ';' };
char const str[58] = {  };

bool test() {
  return exists(str, separators);
}

x86-64 gcc (trunk)(带有选项-O3 -march=native),将生成以下代码

; Load the string to test into edx
mov edx, OFFSET FLAT:str+1
.L2:
; Have we reached the end?
cmp rdx, OFFSET FLAT:str+58
; If yes, then jump to the end
je .L7
; Load a character
movzx ecx, BYTE PTR [rdx]
; Comparing the 3 separators are encoded in the assembler
;  because the compiler detected the array is always the same
mov eax, ecx
and eax, -3
cmp al, 44
sete al
cmp cl, 59
sete cl
; increase outer i
inc rdx
; Did we find a match?
or al, cl
; If no then loop to .L2
je .L2
; We are done!
ret
.L7:
; No match found, clear result
xor eax, eax
; We are done!
ret

看起来不错,但是我在上面的代码中缺少的是使用AVX一次测试多个字符。

答案 2 :(得分:1)

Array.containsArray.exists函数有什么问题?

let exists input illegalChars = 
    input |> Array.exists (fun c -> illegalChars |> Array.contains c)