二元搜索列表中的两种口味

时间:2017-09-27 07:12:38

标签: javascript string data-structures binary-search

这恰好是在JavaScript中,但问题也适用于其他语言。

我有这么长的单词列表,按字母顺序排序,例如:

var myList= [
    {word:"abstract", flavor:"old", extraData:...},
    {word:"aircraft", flavor:"old", extraData:...},
    {word:"airplane", flavor:"new", extraData:...},
    {word:"banana", flavor:"old", extraData:...},
    {word:"calories", flavor:"new", extraData:...},
    ...
];

我的目标是使用一些搜索方法(可能是二进制搜索),以便找到以给定子字符串开头的单词数。在上面的例子中,给定子串“air” - 结果应为2。

但是,有时我需要搜索整个列表,而有时我只需要搜索“旧”项目(按照上面的例子,它应该会产生1个)。

一个明显的解决方案是复制列表,例如:

var wholeList= [
    {word:"abstract", flavor:"old", extraData:...},
    {word:"aircraft", flavor:"old", extraData:...},
    {word:"airplane", flavor:"new", extraData:...},
    {word:"banana", flavor:"old", extraData:...},
    {word:"calories", flavor:"new", extraData:...},
    ...
];

var oldList= [
    {word:"abstract", flavor:"old", extraData:...},
    {word:"aircraft", flavor:"old", extraData:...},
    {word:"banana", flavor:"old", extraData:...},
    ...
];

这在记忆方面当然是非常浪费的。 针对此类问题的任何其他/已知解决方案?

5 个答案:

答案 0 :(得分:0)

要在单词后过滤:

const search ="air";
const result = myList.filter(word => word.word.substr(0,search.length) === search);

要获得旧的:

const result = myList.filter( word => word.flavor === "old");

两者都是:

const search ="air", flavor = "old";
const result = myList.filter(word => 
   word.flavor === flavor && 
   word.word.substr(0,search.length) === search
);

为了改善这一点,可以使用嵌套地图作为查找树,或者您可以预先对它们进行分组。然而,如果您不止一次搜索,那就值得。

答案 1 :(得分:0)

要查找以整个列表的给定子字符串开头的单词数:

myList.filter(data => data.word.includes('air')).length

要查找仅包含old作为flavor的值的列表:

myList.filter(data => data.word.includes('air') && data.flavor === "old").length

如果您需要为搜索添加更多约束,只需添加更多的&符号和一些逻辑,以便过滤处理。

答案 2 :(得分:0)

我想说避免任何需要两次遍历列表的算法。话虽这么说,无论何时涉及到大型列表,我都倾向于转储任何类型的抽象并使用良好的老式循环。只需遍历您的列表并计算匹配的单词,例如:

let count = 0;
const testValue = 'air';
const testFlavor = 'old';


for(var i = 0, len = wholeList.length; i < len; i += 1) {
  const current = wholeList[i];

  if (current.word.startsWith(testValue) && current.flavor === testFlavor) {
    count += 1;
  }
}

当然,如果测试条件更快,您可以用不同的方式制定测试条件,这取决于您的尝试。您可以通过预先按字母顺序索引列表来进一步优化此操作。让我们说你做的事情如下:

const indices = {
  a: [0, 2],
  b: [3, 4]
  // ...
}

然后,您只能遍历相关的细分而不是整个列表:

const index = indices[testValue[0]];
for(var i = index[0], len = index[1]; i < len; i += 1) {
  // ...
}

答案 3 :(得分:0)

这是一种方法,它使用二进制搜索作为基本算法,计算以给定子字符串开头的条目数:

&#13;
&#13;
function countEntries (array, key, prefix) {
  var l = prefix.length
  var i = 0
  var j = array.length - 1
  var lower, upper, k
  
  while (j - i > 1) {
    k = (i + j) >> 1
    
    if (prefix > array[k][key]) {
      i = k
    } else {
      j = k
    }
  }
  
  lower = j
  i = 0
  j = array.length - 1
  
  while (j - i > 1) {
    k = (i + j) >> 1
    
    if (prefix < array[k][key].substr(0, l)) {
      j = k
    } else {
      i = k
    }
  }
  
  upper = j
  
  return upper - lower // array.slice(lower, upper) to confirm
}

// usage

var myList= [
  {word:"aardvark", flavor:"old"},
  {word:"abstract", flavor:"old"},
  {word:"air", flavor:"old"},
  {word:"aircraft", flavor:"old"},
  {word:"airplane", flavor:"new"},
  {word:"banana", flavor:"old"},
  {word:"calories", flavor:"new"},
  {word:"danger", flavor:"old"}
];

console.log(countEntries(myList, 'word', 'air'))
&#13;
&#13;
&#13;

如果我们使用可选过滤器对其进行修改,我们可以对prefix的目标范围进行线性扫描并检查每个元素:

&#13;
&#13;
function countEntries (array, key, prefix, filter) {
  filter = Array.isArray(filter) && filter || []

  var l = prefix.length
  var i = 0
  var j = array.length - 1
  var lower, upper, k
  
  while (j - i > 1) {
    k = (i + j) >> 1
    
    if (prefix > array[k][key]) {
      i = k
    } else {
      j = k
    }
  }
  
  lower = j
  i = 0
  j = array.length - 1
  
  while (j - i > 1) {
    k = (i + j) >> 1
    
    if (prefix < array[k][key].substr(0, l)) {
      j = k
    } else {
      i = k
    }
  }
  
  upper = j
  
  if (filter.length === 0) {
    return upper - lower
  }

  k = 0
  
  outer: for (i = lower; i < upper; i++) {
    for (j = 0; j < filter.length; j++) {
      if (array[i][filter[j][0]] !== filter[j][1]) {
        continue outer
      }
    }
    
    k++
  }
  
  return k
}

// usage

var myList= [
  {word:"aardvark", flavor:"old"},
  {word:"abstract", flavor:"old"},
  {word:"air", flavor:"old"},
  {word:"aircraft", flavor:"old", other:"test"},
  {word:"airflow", flavor:"old", other:"test"},
  {word:"airplane", flavor:"new"},
  {word:"banana", flavor:"old"},
  {word:"calories", flavor:"new"},
  {word:"danger", flavor:"old"}
];

// basic usage still works
console.log(countEntries(myList, 'word', 'air'))

// filters accept multiple key/value pairs
console.log(countEntries(myList, 'word', 'air', [['flavor','old']]))
console.log(countEntries(myList, 'word', 'air', [['flavor','old'],['other','test']]))
&#13;
&#13;
&#13;

答案 4 :(得分:-1)

请在c#中找到以下代码,但对任何其他语言来说都不应该是一个大问题:

    public class Item {
        public string Word { get; set; }
        public string Flavour { get; set; }
    }

public int BinarySearch(Item[] ary, string start, string flavor)
        {
            int upperBound = ary.Length - 1, lowerBound = 0, mid,count=0;
            while (lowerBound <= upperBound)
            {
                mid= (int)((lowerBound + upperBound)/ 2);
                if (ary[mid].Word.StartsWith(start))
                {
                    if (!String.IsNullOrEmpty(flavor)) {
                        if (ary[mid].Flavour == flavor) {
                            // if flavor is provided then increment count only if string starts with value and flavor
                            count += 1;
                        }

                    }
                    else
                    {
                        // flavor is not provided so increment cound for whole array
                        count += 1;
                    }
                }
                else if (start[0] < ary[mid].Word[0]) {
                    upperBound -= 1;
                }
                else if (start[0] > ary[mid].Word[0])
                {
                    lowerBound += 1;
                }

            }
            // if method returns 0 means no item starts with specified value
            return count;

        }