Question

我的课程有数百万项，每个项目的标签类型 int 。我需要根据类似的标签对项目进行分区，最后我返回vector<MyClass>。首先，我根据标签对所有项目进行排序。然后，在for循环中，我将每个标签值与之前的标签值进行比较，如果相同，我将其存储在myclass_temp中，直到label != previous_label。如果label != previous_label我将此myclass_temp添加到vector<MyClass>，我会删除myclass_temp。我认为代码是自我解释的。该程序工作正常，但速度很慢，有没有更好的方法来加快它？我相信因为我在开始时对项目进行排序，所以应该有一种更快捷的方法来简单地对具有相似标签的项目进行分区。

第二个问题是如何计算此算法的O分数以及任何建议的更快解决方案？请随时更正我的代码。

 vector <MyClass> PartitionByLabels(MyClass &myclass){

    /// sort MyClass items based on label number
    printf ("Sorting items by label number... \n");
    std::sort(myclass.begin(), myclass.end(), compare_labels);

    vector <MyClass> myClasses_vec;
    MyClass myclass_temp;

    int previous_label=0, label=0;
    int total_items;

    /// partition myclass items based on similar labels
    for (int i=0; i < myclass.size(); i++){

        label = myclass[i].label;
        if (label == previous_label){
            myclass_temp.push_back(myclass[i]);
            previous_label = label;

            /// add the last similar items
            if (i == myclass.size()-1){
                myClasses_vec.push_back(myclass_temp);
                total_items +=myclass_temp.size();
            }
        } else{
            myClasses_vec.push_back(myclass_temp);
            total_items +=myclass_temp.size();

            myclass_temp.EraseItems();
            myclass_temp.push_back(myclass[i]);
            previous_label = label;
        }
    }

    printf("Total number of items: %d \n", total_items);
    return myClasses_vec;
}

Answer 1

为什么不创建从int到矢量的地图，迭代原始矢量一次，将每个MyClass对象添加到TheMap[myclass[i].label]？它将您的平均运行时间从f(n + n*log(n))升级到f(n)。

Answer 2

这个算法应该这样做。我删除了模板，以便更容易检查godbolt。

应该很容易重新投入。

此方法的O得分是std :: sort - O（N.log（N））

#include <vector>
#include <algorithm>
#include <string>
#include <iterator>

struct thing
{
    std::string label;
    std::string value;
};

using MyClass = std::vector<thing>;
using Partitions = std::vector<MyClass>;

auto compare_labels = [](thing const& l, thing const& r) {
    return l.label < r.label;
};

// pass by value - we need a copy anyway and we might get copy elision
Partitions PartitionByLabels(MyClass myclass){

    /// sort MyClass items based on label number
    std::sort(myclass.begin(), myclass.end(), compare_labels);

    Partitions result;

    auto first = myclass.begin();
    auto last = myclass.end();

    // because the range is sorted, we can partition it in linear time.
    // choosing the correct algorithm is always the best optimisation
    while (first != last) 
    {
        auto next = std::find_if(first, last, [&first](auto const& x) { return x.label != first->label; });

        // let's move the items - that should speed things up a little
        // this is safe because we took a copy
        result.push_back(MyClass(std::make_move_iterator(first), 
                                 std::make_move_iterator(next)));
        first = next;
    }

    return result;
}

我们当然可以使用无序地图做得更好，如果：

标签是可清除的，可比较的
我们不需要订购输出（如果我们这样做，我们会改为使用多图）

此方法的O分数是线性时间O（N）

#include <vector>
#include <algorithm>
#include <string>
#include <iterator>
#include <unordered_map>

struct thing
{
    std::string label;
    std::string value;
};

using MyClass = std::vector<thing>;
using Partitions = std::vector<MyClass>;

// pass by value - we need a copy anyway and we might get copy elision
Partitions PartitionByLabels(MyClass const& myclass){

    using object_type = MyClass::value_type;
    using label_type = decltype(std::declval<object_type>().label);
    using value_type = decltype(std::declval<object_type>().value);

    std::unordered_multimap<label_type, value_type> inter;
    for(auto&& x : myclass) {
        inter.emplace(x.label, x.value);
    }

    Partitions result;

    auto first = inter.begin();
    auto last = inter.end();

    while (first != last) 
    {
        auto range = inter.equal_range(first->first);
        MyClass tmp;
        tmp.reserve(std::distance(range.first, range.second));
        for (auto i = range.first ; i != range.second ; ++i) {
            tmp.push_back(object_type{i->first, std::move(i->second)});
        }
        result.push_back(std::move(tmp));
        first = range.second;
    }

    return result;
}

基于C ++中的属性划分一类对象的优化方法

2 个答案: