这恰好是在JavaScript中,但问题也适用于其他语言。
我有这么长的单词列表,按字母顺序排序,例如:
var myList= [
{word:"abstract", flavor:"old", extraData:...},
{word:"aircraft", flavor:"old", extraData:...},
{word:"airplane", flavor:"new", extraData:...},
{word:"banana", flavor:"old", extraData:...},
{word:"calories", flavor:"new", extraData:...},
...
];
我的目标是使用一些搜索方法(可能是二进制搜索),以便找到以给定子字符串开头的单词数。在上面的例子中,给定子串“air” - 结果应为2。
但是,有时我需要搜索整个列表,而有时我只需要搜索“旧”项目(按照上面的例子,它应该会产生1个)。
一个明显的解决方案是复制列表,例如:
var wholeList= [
{word:"abstract", flavor:"old", extraData:...},
{word:"aircraft", flavor:"old", extraData:...},
{word:"airplane", flavor:"new", extraData:...},
{word:"banana", flavor:"old", extraData:...},
{word:"calories", flavor:"new", extraData:...},
...
];
var oldList= [
{word:"abstract", flavor:"old", extraData:...},
{word:"aircraft", flavor:"old", extraData:...},
{word:"banana", flavor:"old", extraData:...},
...
];
这在记忆方面当然是非常浪费的。 针对此类问题的任何其他/已知解决方案?
答案 0 :(得分:0)
要在单词后过滤:
const search ="air";
const result = myList.filter(word => word.word.substr(0,search.length) === search);
要获得旧的:
const result = myList.filter( word => word.flavor === "old");
两者都是:
const search ="air", flavor = "old";
const result = myList.filter(word =>
word.flavor === flavor &&
word.word.substr(0,search.length) === search
);
为了改善这一点,可以使用嵌套地图作为查找树,或者您可以预先对它们进行分组。然而,如果您不止一次搜索,那就值得。
答案 1 :(得分:0)
要查找以整个列表的给定子字符串开头的单词数:
myList.filter(data => data.word.includes('air')).length
要查找仅包含old作为flavor的值的列表:
myList.filter(data => data.word.includes('air') && data.flavor === "old").length
如果您需要为搜索添加更多约束,只需添加更多的&符号和一些逻辑,以便过滤处理。
答案 2 :(得分:0)
我想说避免任何需要两次遍历列表的算法。话虽这么说,无论何时涉及到大型列表,我都倾向于转储任何类型的抽象并使用良好的老式循环。只需遍历您的列表并计算匹配的单词,例如:
let count = 0;
const testValue = 'air';
const testFlavor = 'old';
for(var i = 0, len = wholeList.length; i < len; i += 1) {
const current = wholeList[i];
if (current.word.startsWith(testValue) && current.flavor === testFlavor) {
count += 1;
}
}
当然,如果测试条件更快,您可以用不同的方式制定测试条件,这取决于您的尝试。您可以通过预先按字母顺序索引列表来进一步优化此操作。让我们说你做的事情如下:
const indices = {
a: [0, 2],
b: [3, 4]
// ...
}
然后,您只能遍历相关的细分而不是整个列表:
const index = indices[testValue[0]];
for(var i = index[0], len = index[1]; i < len; i += 1) {
// ...
}
答案 3 :(得分:0)
这是一种方法,它使用二进制搜索作为基本算法,计算以给定子字符串开头的条目数:
function countEntries (array, key, prefix) {
var l = prefix.length
var i = 0
var j = array.length - 1
var lower, upper, k
while (j - i > 1) {
k = (i + j) >> 1
if (prefix > array[k][key]) {
i = k
} else {
j = k
}
}
lower = j
i = 0
j = array.length - 1
while (j - i > 1) {
k = (i + j) >> 1
if (prefix < array[k][key].substr(0, l)) {
j = k
} else {
i = k
}
}
upper = j
return upper - lower // array.slice(lower, upper) to confirm
}
// usage
var myList= [
{word:"aardvark", flavor:"old"},
{word:"abstract", flavor:"old"},
{word:"air", flavor:"old"},
{word:"aircraft", flavor:"old"},
{word:"airplane", flavor:"new"},
{word:"banana", flavor:"old"},
{word:"calories", flavor:"new"},
{word:"danger", flavor:"old"}
];
console.log(countEntries(myList, 'word', 'air'))
&#13;
如果我们使用可选过滤器对其进行修改,我们可以对prefix
的目标范围进行线性扫描并检查每个元素:
function countEntries (array, key, prefix, filter) {
filter = Array.isArray(filter) && filter || []
var l = prefix.length
var i = 0
var j = array.length - 1
var lower, upper, k
while (j - i > 1) {
k = (i + j) >> 1
if (prefix > array[k][key]) {
i = k
} else {
j = k
}
}
lower = j
i = 0
j = array.length - 1
while (j - i > 1) {
k = (i + j) >> 1
if (prefix < array[k][key].substr(0, l)) {
j = k
} else {
i = k
}
}
upper = j
if (filter.length === 0) {
return upper - lower
}
k = 0
outer: for (i = lower; i < upper; i++) {
for (j = 0; j < filter.length; j++) {
if (array[i][filter[j][0]] !== filter[j][1]) {
continue outer
}
}
k++
}
return k
}
// usage
var myList= [
{word:"aardvark", flavor:"old"},
{word:"abstract", flavor:"old"},
{word:"air", flavor:"old"},
{word:"aircraft", flavor:"old", other:"test"},
{word:"airflow", flavor:"old", other:"test"},
{word:"airplane", flavor:"new"},
{word:"banana", flavor:"old"},
{word:"calories", flavor:"new"},
{word:"danger", flavor:"old"}
];
// basic usage still works
console.log(countEntries(myList, 'word', 'air'))
// filters accept multiple key/value pairs
console.log(countEntries(myList, 'word', 'air', [['flavor','old']]))
console.log(countEntries(myList, 'word', 'air', [['flavor','old'],['other','test']]))
&#13;
答案 4 :(得分:-1)
请在c#中找到以下代码,但对任何其他语言来说都不应该是一个大问题:
public class Item {
public string Word { get; set; }
public string Flavour { get; set; }
}
public int BinarySearch(Item[] ary, string start, string flavor)
{
int upperBound = ary.Length - 1, lowerBound = 0, mid,count=0;
while (lowerBound <= upperBound)
{
mid= (int)((lowerBound + upperBound)/ 2);
if (ary[mid].Word.StartsWith(start))
{
if (!String.IsNullOrEmpty(flavor)) {
if (ary[mid].Flavour == flavor) {
// if flavor is provided then increment count only if string starts with value and flavor
count += 1;
}
}
else
{
// flavor is not provided so increment cound for whole array
count += 1;
}
}
else if (start[0] < ary[mid].Word[0]) {
upperBound -= 1;
}
else if (start[0] > ary[mid].Word[0])
{
lowerBound += 1;
}
}
// if method returns 0 means no item starts with specified value
return count;
}