我正在尝试确定以字符为元素的向量中最常见的字符。
我正在考虑这样做:
这看起来很复杂,因此我想知道是否有人可以建议这种方法在性能/良好编码方面是否被认为是“可接受的”
这可以更好地完成吗?
答案 0 :(得分:6)
如果您只使用常规的ascii字符,则可以使解决方案更快一些 - 而不是使用地图,使用大小为256的数组并使用给定代码“x”计算字符的出现次数单元格count[x]
。这将从您的解决方案中删除对数(256),从而使其更快一些。对于优化此算法,我认为不能做更多的事情。
答案 1 :(得分:2)
对字符向量进行排序然后迭代查找最大运行长度似乎比使用映射方法快5倍(使用下面相当不科学的测试代码作用于16M字符)。从表面上看,两个函数应该彼此接近,因为它们以O(N log N)执行。但是,排序方法可能会受益于就地矢量排序的branch prediction和move semantics。
结果输出为:
Most freq char is '\334', appears 66288 times.
usingSort() took 938 milliseconds
Most freq char is '\334', appears 66288 times.
usingMap() took 5124 milliseconds
代码是:
#include <iostream>
#include <map>
#include <vector>
#include <chrono>
void usingMap(std::vector<char> v)
{
std::map<char, int> m;
for ( auto c : v )
{
auto it= m.find(c);
if( it != m.end() )
m[c]++;
else
m[c] = 1;
}
char mostFreq;
int count = 0;
for ( auto mi : m )
if ( mi.second > count )
{
mostFreq = mi.first;
count = mi.second;
}
std::cout << "Most freq char is '" << mostFreq << "', appears " << count << " times.\n";
}
void usingSort(std::vector<char> v)
{
std::sort( v.begin(), v.end() );
char currentChar = v[0];
char mostChar = v[0];
int currentCount = 0;
int mostCount = 0;
for ( auto c : v )
{
if ( c == currentChar )
currentCount++;
else
{
if ( currentCount > mostCount)
{
mostChar = currentChar;
mostCount = currentCount;
}
currentChar = c;
currentCount = 1;
}
}
std::cout << "Most freq char is '" << mostChar << "', appears " << mostCount << " times.\n";
}
int main(int argc, const char * argv[])
{
size_t size = 1024*1024*16;
std::vector<char> v(size);
for ( int i = 0; i < size; i++)
{
v[i] = random() % 256;
}
auto t1 = std::chrono::high_resolution_clock::now();
usingSort(v);
auto t2 = std::chrono::high_resolution_clock::now();
std::cout
<< "usingSort() took "
<< std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
<< " milliseconds\n";
auto t3 = std::chrono::high_resolution_clock::now();
usingMap(v);
auto t4 = std::chrono::high_resolution_clock::now();
std::cout
<< "usingMap() took "
<< std::chrono::duration_cast<std::chrono::milliseconds>(t4-t3).count()
<< " milliseconds\n";
return 0;
}