MongoDB为搜索引擎

时间:2016-02-09 05:08:46

标签: regex mongodb

我正在尝试在MongoDB中编写搜索脚本,但无法弄清楚如何操作....我不想做的事情如下......

让我有一个字符串数组XD = {"the","new","world"}

现在我想在MongoDB文档中搜索字符串数组XD(使用正则表达式)并获取结果文档。例如..

{ _id: 1, _content: "there was a boy" }
{ _id: 2, _content: "there was a boy in a new world" }
{ _id: 3, _content: "a boy" }
{ _id: 4, _content: "there was a boy in world" }

现在我想根据_content得到的结果包含字符串数组XD中的字符串

{ _id: 2, _content: "there was a boy in a new world", _times: 3 }
{ _id: 4, _content: "there was a boy in world", times: 2 }
{ _id: 1, _content: "there was a boy", times: 1 }

因为第一个文档(_id : 2 )包含所有三个{ "the" in there, "new" as new, "world" as world }所以它得到3

第二个文档(_id: 4)只有两个{ "world" as world }所以它得到2

1 个答案:

答案 0 :(得分:1)

这是你可以做的。

创建一个与_content

匹配的正则表达式
XD = ["the","new","world"];
regex = new RegExp(XD.join("|"), "g");

在服务器上存储JS功能,该功能与_content匹配XD并返回匹配的计数

db.system.js.save(
   {
     _id: "findMatchCount",
     value : function(str, regexStr) {
        XD = ["the","new","world"];
        var matches = str.match(regexStr);
        return (matches !== null) ? matches.length : 0;
     }
   }
)

将此功能与mapReduce

一起使用
db.test.mapReduce(
    function(regex) {
       emit(this._id, findMatchCount(this._content, regex));
    },
    function(key,values) {
        return values;
    },
    { "out": { "inline": 0 } }
);

这将产生如下输出:

{
    "results" : [
        {
            "_id" : 1,
            "value" : 1
        },
        {
            "_id" : 2,
            "value" : 1
        },
        {
            "_id" : 3,
            "value" : 1
        },
        {
            "_id" : 4,
            "value" : 1
        }
    ],
    "timeMillis" : 1,
    "counts" : {
        "input" : 4,
        "emit" : 4,
        "reduce" : 0,
        "output" : 4
    },
    "ok" : 1
}

我不确定这个解决方案的效率如何,但确实有效。

希望这有帮助。