我有一个集合A
和数组B
,其结构如下:
A
:
{
"_id" : ObjectId("5160757496cc6207a37ff778"),
"name" : "Pomegranate Yogurt Bowl",
"description" : "A simple breakfast bowl made with Greek yogurt, fresh pomegranate juice, puffed quinoa cereal, toasted sunflower seeds, and honey."
},
{
"_id": ObjectId("5160757596cc62079cc2db18"),
"name": "Krispy Easter Eggs",
"description": "Imagine the Easter Bunny laying an egg. Wait. That’s not anatomically possible. And anyway, the Easter Bunny is a b..."
}
B
:
var names = ["egg", "garlic", "cucumber", "kale", "pomegranate", "sunflower", "fish", "pork", "apple", "sunflower", "strawberry", "banana"]
我的目标是从A
返回一个文档,该文档在数组B
中存在最多的单词。在这种情况下,它应该返回第一个"_id" : ObjectId("5160757496cc6207a37ff778")
。
我不确定如何解决这个问题:
这不起作用:
db.A.find({
"description": {
"$in": names
}
}, function(err, data) {
if (err) console.log(err);
console.log(data);
});
答案 0 :(得分:1)
这取决于您想要投入的“单词”类型,以及它们是否被视为“停止单词”,例如"a"
,"the"
,"with"
等或如果这些事情的数量真的不重要。
如果它们无关紧要,请考虑$text
索引并进行搜索。
第一个指数:
db.A.createIndex({ "name": "text", "description": "text" })
然后构建搜索:
var words = [
"egg", "garlic", "cucumber", "kale", "pomegranate",
"sunflower", "fish", "pork", "apple", "sunflower",
"strawberry", "banana"
];
var search = words.join(" ")
db.A.find(
{ "$text": { "$search": search } },
{ "score": { "$meta": "textScore" } }
).sort({ "score": { "$meta": "textScore" }}).limit(1)
返回第一个文档:
{
"_id" : ObjectId("5160757496cc6207a37ff778"),
"name" : "Pomegranate Yogurt Bowl",
"description" : "A simple breakfast bowl made with Greek yogurt, fresh pomegranate juice, puffed quinoa cereal, toasted sunflower seeds, and honey.",
"score" : 1.7291666666666665
}
另一方面,如果您需要计算“停用词”,那么mapReduce
可以为您找到结果:
db.A.mapReduce(
function() {
var words = [
"egg", "garlic", "cucumber", "kale", "pomegranate",
"sunflower", "fish", "pork", "apple", "sunflower",
"strawberry", "banana"
];
var count = 0;
var fulltext = this.name.toLowerCase() + " " + this.description.toLowerCase();
// Increment count by number of matches
words.forEach(function(word) {
count += ( fulltext.match(new RegExp(word,"ig")) || [] ).length;
});
emit(null,{ count: count, doc: this });
},
function(key,values) {
// Sort largest first, return first
return values.sort(function(a,b) {
return a.count < b.count;
})[0];
},
{ "out": { "inline": 1 } }
)
结果:
{
"_id" : null,
"value" : {
"count" : 4,
"doc" : {
"_id" : ObjectId("5160757496cc6207a37ff778"),
"name" : "Pomegranate Yogurt Bowl",
"description" : "A simple breakfast bowl made with Greek yogurt, fresh pomegranate juice, puffed quinoa cereal, toasted sunflower seeds, and honey."
}
}
}
所以“文本”索引方法按匹配数“加权”,然后只返回最大加权匹配。
mapReduce
操作遍历每个文档并编制得分。然后“减速器”对结果进行分类,并保持得分最高的那个。
请注意,“reducer”可以多次调用,因此“不会”尝试立即对集合中的所有文档进行排序。但它仍然是真正的“蛮力”。