我看到一个未定义的行为,具体取决于本地数组的大小。 对于以下代码:
int wbExecute_simple(char nInput, char add_pattern)
{
char test_array[4] = { 0xa, 0xb, 0xc, 0xd };
int i = 0;
for (; i < 4; ++i)
{
test_array[i] ^= nInput;
}
return (test_array[nInput] + add_pattern);
}
第一行的LLVM IR表示形式为:
lbl_0_wb3954:
%local_0_wb3954 = alloca [4 x i8], align 1
%local_1_wb3954 = bitcast [4 x i8]* %local_0_wb3954 to i32*, !dbg !7
%local_2_wb3954 = bitcast [4 x i8]* @global_0_wb3954 to i32*
%local_3_wb3954 = load i32* %local_2_wb3954, align 1, !dbg !7
store i32 %local_3_wb3954, i32* %local_1_wb3954, align 1, !dbg !7
br label %lbl_1_wb3954, !dbg !8
数组大小为2也会产生类似的结果。但是,如下所示,将数组的大小从4更改为3,
int wbExecute_simple(char nInput, char add_pattern)
{
char test_array[3] = { 0xa, 0xb, 0xc };
int i = 0;
for (; i < 3; ++i)
{
test_array[i] ^= nInput;
}
return (test_array[nInput] + add_pattern);
}
的产率
define i32 @wbExecute_simple(i8 signext %nInput, i8 signext %add_pattern) #0 {
lbl_0_wb3954:
%local_0_wb3954 = alloca [3 x i8], align 1
%local_1_wb3954 = getelementptr inbounds [3 x i8]* %local_0_wb3954, i32 0, i32 0, !dbg !7
%local_2_wb3954 = getelementptr [3 x i8]* @global_0_wb3954, i32 0, i32 0
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %local_1_wb3954, i8* %local_2_wb3954, i32 3, i32 1, i1 false), !dbg !7
br label %lbl_1_wb3954, !dbg !8
答案 0 :(得分:1)
我不确定你的未定义行为是什么意思。这看起来像是合法的编译器优化。
当数组长度为4时,编译器会通过复制单个整数来替换复制数组,因为它的大小也是4,并且可以在单个操作中完成。我假设大小为2,它将复制一个16位整数。
您的系统可能不支持任何24位整数,因此编译器决定不对大小3进行优化,并保留memcpy
内在函数。请注意,没有&#34; int24&#34;处理器和内存系统支持的类型,这种优化对于3号阵列没有意义。编译器后端可能会进一步优化,以改善剩余的memcpy内在,具体取决于它是否在目标机器上有意义。
我将对生成的IR进行评论,以阐明代码的作用:
lbl_0_wb3954:
// allocate the local array
%local_0_wb3954 = alloca [4 x i8], align 1
// cast the address of the local array to an integer pointer
%local_1_wb3954 = bitcast [4 x i8]* %local_0_wb3954 to i32*, !dbg !7
// cast the address of the constant array { 0xa, 0xb, 0xc, 0xd } to an integer pointer
%local_2_wb3954 = bitcast [4 x i8]* @global_0_wb3954 to i32*
// load the constant array as a 32 bit integer
%local_3_wb3954 = load i32* %local_2_wb3954, align 1, !dbg !7
// store the value to the local array
store i32 %local_3_wb3954, i32* %local_1_wb3954, align 1, !dbg !7
br label %lbl_1_wb3954, !dbg !8
我不认为这种类型的优化很容易被迫使用3。
传递-O2将导致完成展开长度3循环并删除静态初始化,以支持将常量0xa
,0xb
和0xc
内联到{{ 3}},大小四的代码看起来类似。
%3 = alloca [3 x i8], align 1
// compute address of first array element
%4 = getelementptr inbounds [3 x i8], [3 x i8]* %3, i64 0, i64 0, !dbg !22
// compute 0xa ^ nInput;
%5 = xor i8 %0, 10, !dbg !25
// store the result
store i8 %5, i8* %4, align 1, !dbg !25, !tbaa !29
// do the same for the second and third elements
%6 = getelementptr inbounds [3 x i8], [3 x i8]* %3, i64 0, i64 1, !dbg !32
%7 = xor i8 %0, 11, !dbg !25
store i8 %7, i8* %6, align 1, !dbg !25, !tbaa !29
%8 = getelementptr inbounds [3 x i8], [3 x i8]* %3, i64 0, i64 2, !dbg !32
%9 = xor i8 %0, 12, !dbg !25
store i8 %9, i8* %8, align 1, !dbg !25, !tbaa !29