我需要重复将1024+连续4字节浮点数(范围-1到1)转换为2字节短路(范围-32768到32767)并写入磁盘。
目前我通过循环执行此操作:
short v = 0;
for (unsigned int sample = 0; sample < length; sample++)
{
v = (short)(inbuffer[sample * 2] * 32767.0f);
fwrite(&v, 2, 1, file);
}
这可行,但浮点计算和循环很昂贵。有什么办法可以优化吗?
答案 0 :(得分:6)
short v = 0;
for (unsigned int sample = 0; sample < length; sample++)
{
v = (short)(inbuffer[sample * 2] * 32767.0f);
// The problem is not here-------^^^^^^^^^^^
fwrite(&v, 2, 1, file);
// it is here ^^^^^^^
}
典型的Mac(Objective-c标签,或者我们在这里谈论iphone?)每秒可以进行数十亿次浮点运算。然而,fwrite是一个库调用,它跟随一些间接,将其数据写入某个缓冲区并可能刷新它。最好是批量填充自己的缓冲区:
short v[SZ] = 0;
// make sure SZ is always > length, or allocate a working buffer on the heap.
for (unsigned int sample = 0; sample < length; sample++)
{
v[sample] = (short)(inbuffer[sample * 2] * 32767.0f);
}
fwrite(v,sizeof(v),1,file);
答案 1 :(得分:2)
我原本以为重复拨打fwrite
会是一个昂贵的部分。怎么样:
short outbuffer[length]; // note: you'll have to malloc this if length isn't constant and you're not using a version of C that supports dynamic arrays.
for (unsigned int sample = 0; sample < length; sample++)
{
outbuffer[sample] = (short)(inbuffer[sample * 2] * 32767.0f);
}
fwrite(outbuffer, sizeof *outbuffer, length, file);
答案 2 :(得分:2)
我想,你的循环的瓶颈可能不是浮动转换的短暂,而是将输出写入文件 - 尝试将文件输出移到循环之外
short v = 0;
short outbuffer = // create outbuffer of required size
for (unsigned int sample = 0; sample < length; sample++)
{
outbuffer[sample] = (short)(inbuffer[sample * 2] * 32767.0f);
}
fwrite(outbuffer, 2, sizeof(outbuffer), file);
答案 3 :(得分:0)
您可以尝试这样的事情:
out[i] = table[((uint32_t *)in)[i]>>16];
其中table
是一个查找表,它将IEEE浮点的高16位映射到所需的int16_t
值。然而,这将失去一些精确性。您需要保留并使用23位(1个符号位,8个指数位和14个尾数位)以获得完全精度,这意味着一个16 MB的表,这将破坏缓存一致性,从而消除性能。
您确定浮点转换速度慢吗?只要你以fwrite
方式使用fwrite
,你就会在int16_t
上花费50到100倍的浮点运算时间。如果你处理这个问题并且代码仍然太慢,你可以使用一种方法来添加一个魔术偏差并读取尾数位以转换为{{1}}而不是乘以32767.0。这可能会或可能不会更快。