Question

我正在为我的类工作，这个函数的目标是在结构数组上使用二进制排序，并返回找到姓氏的第一个位置的索引（即使有多个姓氏，只返回第一个）。我的代码几乎完美地适用于我想要做的事情，但是当我打印索引时，我得到的输出是1太多了。例如，如果我使用字符串“Zulauf”作为姓氏调用我的函数：

cout << binaryFindFirstByLastName("Zulauf", person, total) << endl;

我得到99811而不是它的实际位置99812（这显然是从一个大文件中读取）。非常感谢任何帮助或一般建议，谢谢！

int binaryFindFirstByLastName(const std::string& value, const Person* array, int size) {
int low = 0;
int high = size-1;
int mid = (low + high) / 2;
while (low + 1 != high) {
    mid = (low + high) / 2;
    if (array[mid].last < value) {
        low = mid;
    }
    else {
        high = mid;
    }
    mid = (low + high) / 2;
}
if (high > size || array[high].last != value) {
    return -1;
}
else return high;
}

Answer 1

为了完整性，在现实世界中，我们使用现成的库模板函数std::lower_bound：

c ++ 11版：

#include <algorithm>

struct Person
{
    std::string last;
};

struct last_is_less
{
    bool operator()(std::string const& l, Person const& r) const
    {
        return l < r.last;
    }

    bool operator()(Person const& l, std::string const& r) const
    {
        return l.last < r;
    }
};

int binaryFindFirstByLastName(const std::string& value, const Person* array, int size) {
    auto first = array;
    auto last = array + size;
    auto i = std::lower_bound(first, last, value, last_is_less());
    if (i == last || i->last != value)
        return -1;
    return int(std::distance(first, i));
}

c ++ 14版，使用免费功能：

bool last_name_is_less(std::string const& l, Person const& r)
{
    return l < r.last;
}

bool last_name_is_less(Person const& l, std::string const& r)
{
    return l.last < r;
}

// using lambda to aid in expressing semantic intent
//
int binaryFindFirstByLastName2(const std::string& value, const Person* array, int size) {

    auto first = array;
    auto last = array + size;

    auto to_index = [&](auto iter) 
    {
        if (iter == last || iter->last != value)
            return -1;
        return int(std::distance(first, iter));
    };

    return to_index(std::lower_bound(first, last, 
                                     value, 
                                     [](auto&& l, auto&& r) 
                                     { 
                                         return last_name_is_less(l, r); 
                                     }));
}

Answer 2

让我们一步一步地完成算法。为了简单起见，我将在这里使用int。

首先，我们有循环条件。我们应该使用low < high，因为我们希望将搜索范围缩小到一个元素，以后我们可以检查它是否是循环外的目标，或者low结束的情况是high + 1，搜索未命中。

现在，让我们看一下循环体中可能发生的三种情况。如果我们的目标值大于当前元素，我们需要low = mid + 1，因为目标可能是最左边的位置是右边的下一个元素。如果目标值小于当前元素，则相同。如果目标值等于当前元素，我们需要hi = mid，因为此元素可能是我们正在查找的元素，或者它可能在左侧。

一旦循环退出，我们需要处理两种情况。如果low > high，我们显然有搜索未命中。否则，我们需要检查剩余的元素，看它是否等于我们的目标。

把这一切放在一起：

int binarySearch(const int &value, const int *a, const int &size)
{
    int low = 0, high = size - 1;
    //Note that these do have to be signed types, since high could be -1
    while(low < high)
    {
        int mid = low + (high - low) / 2;
        //a way to find the average without any chance of overflow
        if(value == a[mid])
        {
            high = mid;
        }
        else if(value < a[mid])
        {
            high = mid - 1;
        }
        else
        {
            low = mid + 1;
        }
    }
    return (low > high || a[low] != value) ? -1 : low;
}

还有其他方法可以实现，但我发现这种方式最简单。

如何使用二进制搜索返回姓氏（给定字符串）的第一个索引/出现？

2 个答案: