我正在考虑在我们的产品代码中使用__sqrtf
函数。为此,我想首先针对C数学库sqrtf
函数评估其性能。我编写了以下代码以进行快速测试。
#include<arm_neon.h>
#include<iostream>
#include<cmath>
#include<chrono>
using namespace std;
using namespace std::chrono;
int main()
{
srand(static_cast<unsigned int>(time(NULL)));
for (int i = 0; i < 100; i++)
{
float f = static_cast<float>(rand() / (static_cast<float>(RAND_M AX / 100.0)));
high_resolution_clock::time_point t1 = high_resolution_clock::no w();
float ans1 = sqrtf(f);
high_resolution_clock::time_point t2 = high_resolution_clock::no w();
float ans2 = __sqrtf(f);
high_resolution_clock::time_point t3 = high_resolution_clock::no w();
auto duration1 = duration_cast<microseconds>(t2 - t1).count();
auto duration2 = duration_cast<microseconds>(t3 - t2).count();
cout << duration1 << " " << duration2 << endl;
}
return 0;
}
当我尝试在我的Raspberry Pi上使用g++ -mfpu=neon-vfpv4 test.cpp
编译代码时,会出现错误
pi@raspberrypi:~/temp $ g++ -mfpu=neon-vfpv4 test.cpp
/tmp/ccK08ITT.o: In function `main':
test.cpp:(.text+0x74): undefined reference to `__sqrtf'
collect2: error: ld returned 1 exit status
我认为我的Raspberry Pi具有VFP协处理器。
pi@raspberrypi:~/temp $ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 1
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 2
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
processor : 3
model name : ARMv7 Processor rev 4 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xd03
CPU revision : 4
是否必须包含一些头文件才能使用__sqrtf
?此页面未提及-http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491e/CJAIFJIF.html