具有未知数量项目的二进制搜索

时间:2011-04-08 05:44:35

标签: arrays algorithm binary-search

假设您不知道要搜索的元素数量,并且给定一个接受索引的API,并且如果超出边界将返回null(如此处使用getWordFromDictionary方法实现),如何执行二进制文件为客户端程序搜索并实现isWordInDictionary()方法?

此解决方案有效,但我最终在我找到初始高索引值的级别上进行了序列搜索。通过较低范围的值进行搜索的灵感来自this answer。我还偷看了Reflector中的BinarySearch(C#反编译器),但它有一个已知的列表长度,所以仍然希望填补空白。

private static string[] dictionary;

static void Main(string[] args)
{
    dictionary = System.IO.File.ReadAllLines(@"C:\tmp\dictionary.txt");

    Console.WriteLine(isWordInDictionary("aardvark", 0));
    Console.WriteLine(isWordInDictionary("bee", 0));
    Console.WriteLine(isWordInDictionary("zebra", 0));
    Console.WriteLine(isWordInDictionaryBinary("aardvark"));
    Console.WriteLine(isWordInDictionaryBinary("bee"));
    Console.WriteLine(isWordInDictionaryBinary("zebra"));
    Console.ReadLine();
}

static bool isWordInDictionaryBinary(string word)
{
    // assume the size of the dictionary is unknown

    // quick check for empty dictionary
    string w = getWordFromDictionary(0);
    if (w == null)
        return false;

    // assume that the length is very big.
    int low = 0;
    int hi = int.MaxValue;

    while (low <= hi)
    {
        int mid = (low + ((hi - low) >> 1));
        w = getWordFromDictionary(mid);

        // If the middle element m you select at each step is outside 
        // the array bounds (you need a way to tell this), then limit
        // the search to those elements with indexes small than m.
        if (w == null)
        {
            hi = mid;
            continue;
        }

        int compare = String.Compare(w, word);
        if (compare == 0)
            return true;

        if (compare < 0)
            low = mid + 1;
        else
            hi = mid - 1;
    }

    // punting on the search above the current value of hi 
    // to the (still unknown) upper limit
    return isWordInDictionary(word, hi);
}


// serial search, works good for small number of items
static bool isWordInDictionary(string word, int startIndex) 
{
    // assume the size of the dictionary is unknown
    int i = startIndex;
    while (getWordFromDictionary(i) != null)
    {
        if (getWordFromDictionary(i).Equals(word, StringComparison.OrdinalIgnoreCase))
            return true;
        i++;
    }

    return false;
}

private static string getWordFromDictionary(int index)
{
    try
    {
        return dictionary[index];
    }
    catch (IndexOutOfRangeException)
    {
        return null;
    }
}

答案后的最终守则

static bool isWordInDictionaryBinary(string word)
{
    // assume the size of the dictionary is unknown

    // quick check for empty dictionary
    string w = getWordFromDictionary(0);
    if (w == null)
        return false;

    // assume that the number of elements is very big
    int low = 0;
    int hi = int.MaxValue;

    while (low <= hi)
    {
        int mid = (low + ((hi - low) >> 1));
        w = getWordFromDictionary(mid);

        // treat null the same as finding a string that comes 
        // after the string you are looking for
        if (w == null)
        {
            hi = mid - 1;
            continue;
        }

        int compare = String.Compare(w, word);
        if (compare == 0)
            return true;

        if (compare < 0)
            low = mid + 1;
        else
            hi = mid - 1;
    }

    return false;
}

3 个答案:

答案 0 :(得分:4)

您可以分两个阶段实施二进制搜索。在第一阶段,您将增加您正在搜索的间隔的大小。一旦检测到您超出边界,就可以在找到的最新间隔内进行正常的二分查找。像这样:

bool isPresentPhase1(string word)
{
  int l = 0, d = 1;
  while( true ) // you should eventually reach an index out of bounds
  {
    w = getWord(l + d);
    if( w == null )
      return isPresentPhase2(word, l, l + d - 1);
    int c = String.Compare(w, word);
    if( c == 0 )
      return true;
    else if( c < 0 )
      isPresentPhase2(value, l, l + d - 1);
    else
    {
      l = d + 1;
      d *= 2;
    } 
  }
}

bool isPresentPhase2(string word, int lo, int hi)
{
    // normal binary search in the interval [lo, hi]
}

答案 1 :(得分:2)

当然可以。从索引1开始,并将查询索引加倍,直到遇到的词汇量大于查询词(Edit:或null)。然后,您可以再次缩小搜索空间范围,直到找到索引,或返回false。

编辑:请注意,这不会添加到渐近​​运行时,它仍然是O(logN),其中N是系列中的项目数。

答案 2 :(得分:0)

所以,我不确定我是否完全理解你的描述中的问题,但我假设你正在尝试搜索未知长度的排序数组来查找特定的字符串。我还假设实际数组中没有空值;如果要求索引超出范围,则数组仅返回null。

如果这些都是真的,那么解决方案应该只是一个标准的二进制搜索,尽管你在整个整数空间中进行搜索,你只需要将null视为找到你要查找的字符串之后的字符串。 。基本上只是想象你的N个字符串的排序数组实际上是一个排序的INT_MAX字符串数组,最后用空值排序。

我不太明白的是,你似乎基本上已经完成了(至少从粗略看一下代码),所以我想我可能完全不了解你的问题。