Question

我有一个包含大约10k项目的索引，这些项目必须按字典顺序排序为caseinsensitive。

这是我的方法：

bool lowercomp (AbstractServiceProvider::AbstractItem*  i, AbstractServiceProvider::AbstractItem* j)

{
    std::string a,b;

    // lower first string
    a.resize(i->title().length());
    std::transform(i->title().cbegin(), i->title().cend(), a.begin(),
                std::bind2nd(std::ptr_fun(&std::tolower<char>), std::locale("")));

    // lower 2nd string
    b.resize(j->title().length());
    std::transform(j->title().cbegin(), j->title().cend(), b.begin(),
                std::bind2nd(std::ptr_fun(&std::tolower<char>), std::locale("")));

    return 0 > a.compare(b);
}

我的代码中的某个地方：

t = new boost::timer::auto_cpu_timer;
std::sort(_index.begin(), _index.end(), lowercomp);
delete t;

但这需要大约4秒钟。没有toLower部分，大约需要0.003秒。有没有办法改善这个？

Answer 1

你绝对可以让它更快。解决方案是避免分配内存，而是通过在进行比较时使用tolower（）一次转换一个字符来比较不区分大小写的字符串。比较函数中不应该有类对象的构造。像这样：

bool lowercomp(const AbstractItem* lhs, const AbstractItem* rhs)  
{
    size_t size = std::min(lhs->title().size(), rhs->title().size());
    for (size_t pos = 0; pos < size; ++pos) {
        if (tolower(lhs->title()[pos]) < tolower(rhs->title()[pos]) {
            return true;
        } else if (tolower(lhs->title()[pos]) > tolower(rhs->title()[pos]) {
            return false;
        }
    }
    return lhs->title().size() < rhs->title().size();
}

让我们知道这有多快。：）

Answer 2

在你看到探查器输出之前，要知道在哪里减速是，你不能确定，但有很多要点这似乎可能会导致我放缓。最两个重要的是：

您的函数在每次调用时都会创建两个新字符串。这样可以非常昂贵，
您使用std::tolower的两个操作数形式;这个功能每次调用时都必须提取ctype方面（和你一样）每次构建一个新的语言环境临时实例调用lowercomp。

我自己的偏好是使用功能对象比较。对于一些编译器，速度更快，但在这种情况下，它也更清洁：

class CaseInsensitiveCompare
{
    std::locale myLocale;   //  To ensure lifetime of the facet.
    std::ctype<char> const& myCType;
public:
    CaseInsensitiveCompare( std::locale const& locale = std::locale( "" ) )
        : myLocale( locale )
        , myCType( std::use_facet<std::ctype<char>>( myLocal ) )
    {
    }
    bool operator()( AbstractItem const* lhs, AbstractItem const* rhs ) const
    {
        return (*this)( lhs->title(), rhs->title() );
    }
    bool operator()( std::string const& lhs, std::string const& rhs ) const
    {
        return std::lexicographical_compare(
            lhs.begin(), lhs.end(),
            rhs.begin(), rhs.end(),
            *this);
    }
    bool operator()( char lhs, char rhs ) const
    {
        return myCType.tolower(lhs) < myCType.tolower(rhs);
    }
};

除此之外，还有其他几点可能会有所改善性能：

如果您确定locale您使用的生命周期（你通常可以），将myLocale成员放入类;复制语言环境将是最昂贵的部分复制此类的实例（以及对此类的调用） lexicographical_compare会至少复制一次。）
如果您不需要本地化功能，请考虑使用 tolower中的<cctype>函数，而不是<locale>中的函数 <。这将完全避免需要任何数据成员在比较中。
最后，虽然我不确定它是否值得小到10K的项目，你可能会考虑使用弦的规范形式（已经较低的套管等），在字符串上使用{{1}}排序，然后重新排序根据那个原始载体。

另外，我非常怀疑`new 提高::计时器:: auto_cpu_timer＆＃39 ;.你真的需要动态吗？在这里分配？在手边，我怀疑是一个局部变量更合适。

Answer 3

你的实施对我来说非常低效。我看到了几个问题。

您正在排序比较器中的两个字符串上执行tolower。由于此比较器的调用次数为n log n次，因此您将成为tolowering 两个字符串，每个字符串大约40K次（？）。

我根本不想比较字符串。不仅字符串比较数量级的效率低于其他方法（例如积分比较），它也容易出错并要求您清理数据 - 这是效率低下的另一个原因。

但是，如果您必须比较字符串，请在执行排序之前清除它们。这包括tolower他们。理想情况下，在元素构造时擦除数据。除此之外，您甚至可以在致电sort之前擦洗它。无论你做什么，都不要在比较器内擦洗它。

改善std :: sort性能

3 个答案: