Question

假设vec是可移动和可复制对象的有序向量。删除与value匹配的所有元素的最有效方法是什么？

这是正确且最有效的方法吗？

auto lb = std::lower_bound(vec.begin(), vec.end(), value);
vec.erase(lb, std::upper_bound(std::next(lb), vec.end(), value));

复杂性是多少？（考虑到擦除后所需的任何移动）。

Answer 1

擦除后使矢量未分类的解决方案。

// get the range in 2*log2(N), N=vec.size()
auto bounds=std::equal_range (vec.begin(), vec.end(), value);  

// calculate the index of the first to be deleted O(1)
auto last = vec.end()-std::distance(bounds.first, bounds.last);

// swap the 2 ranges O(equals) , equal = std::distance(bounds.first, bounds.last)
std::swap_ranges(bounds.first, bounds.last, last);

// erase the victims O(equals)
vec.erase(last, vec.end());

std::remove是O（N），此解决方案的写入次数也最少。如果等于接近N，这可能不是一个好主意：）

Answer 2

我已经用三次进行了一些简短的测试，从排序容器中删除了四种不同的方法。

void erase_v1(std::vector<int> &vec, int value)
{
    vec.erase(std::remove(std::begin(vec), std::end(vec), value), std::end(vec));
}

void erase_v2(std::vector<int> &vec, int value)
{
    auto lb = std::lower_bound(std::begin(vec), std::end(vec), value);
    if (lb != std::end(vec) && *lb == value) {
        auto ub = std::upper_bound(lb, std::end(vec), value);
        vec.erase(lb, ub);
    }
}

void erase_v3(std::vector<int> &vec, int value)
{
    auto pr = std::equal_range(std::begin(vec), std::end(vec), value);
    vec.erase(pr.first, pr.second);
}

// Surt's code, doesn't preserve sorted order
void erase_v4(std::vector<int> &vec, int value)
{
    // get the range in 2*log2(N), N=vec.size()
    auto bounds = std::equal_range(vec.begin(), vec.end(), value);

    // calculate the index of the first to be deleted O(1)
    auto last = vec.end() - std::distance(bounds.first, bounds.second);

    // swap the 2 ranges O(equals) , equal = std::distance(bounds.first, bounds.last)
    std::swap_ranges(bounds.first, bounds.second, last);

    // erase the victims O(equals)
    vec.erase(last, vec.end());
}

使用std::vector 10,000,000个元素进行测试，填充[0..9]范围内的随机数，然后排序（MS Visual C ++ 2013）。

删除值0（容器的正面），代表时间如下：

time=14.3894 size=8999147 // v1, milliseconds and updated container size
time=11.9486 size=8999147 // v2
time=11.5548 size=8999147 // v3
time=1.78913 size=8999147 // v4 (Surt)

删除5（容器中间）：

time=12.8223 size=9000844
time=4.89388 size=9000844
time=4.87589 size=9000844
time=1.77284 size=9000844

删除9（容器的末尾）：

time=12.64 size=9000820
time=0.00373372 size=9000820
time=0.00339429 size=9000820
time=1.29899 size=9000820

删除13（值不在容器中）：

time=11.8641 size=10000000
time=0.002376 size=10000000
time=0.00203657 size=10000000
time=0.00220628 size=10000000

erase/remove方法总是迭代整个容器并且速度较慢，lower_bound/upper_bound和equal_range方法在多次运行时几乎相同。我更喜欢上一个版本，因为它正确，代码更简单，输入更少。

编辑：按要求定时Surt's code。它始终以不保留排序顺序为代价而快速。

Answer 3

如果value实际上没有出现vec，那么这是不正确的。所以至少你必须这样做：

auto lb = std::lower_bound(vec.begin(), vec.end(), value);
if (lb != vec.end() && *lb == value) {
    vec.erase(lb, std::upper_bound(std::next(lb), vec.end(), value));
}

关于效率最高的问题：我相信一般情况，对vec中发生的事情一无所知，是的。复杂性仍为O(N)，因为erase()为O(N) - 如果您像第二个元素一样进行删除，则无法进行非线性擦除。但是在找到擦除的界限方面，O(log N)就像它得到的那样好，你得到它。

upper_bound()或find_if()对第二部分是否更好的问题完全取决于你有多少value的可能性。更有可能使用upper_bound()，更可能是唯一的，使用find_if()。

如何从排序的向量中有效地擦除值？

3 个答案: