mongo:使用正则表达式进行文本搜索

时间:2018-04-27 17:51:05

标签: regex mongodb mongodb-query

我有一个名为test的集合,其中包含以下数据:

> db.test.find()
{ "_id" : ObjectId("5ae3494a5daab479a87f51fb"), "a" : "a6", "b" : "b6", "c" : "c6", "__key" : "default-domain:admin:vn1;c8" }
{ "_id" : ObjectId("5ae349645daab479a87f51fc"), "a" : "a7", "b" : "b7", "c" : "c7", "__key" : "default-domain:admin:vn2;c9" }
{ "_id" : ObjectId("5ae349af5daab479a87f51fd"), "a" : "a0", "b" : "b0", "c" : "c0", "__key" : "a0;b0;c0" }
{ "_id" : ObjectId("5ae349be5daab479a87f51fe"), "a" : "a1", "b" : "b1", "c" : "c1", "__key" : "a1;b1;c1" }
{ "_id" : ObjectId("5ae349cc5daab479a87f51ff"), "a" : "a2", "b" : "b1", "c" : "c2", "__key" : "a2;b2;c2" }
{ "_id" : ObjectId("5ae349d75daab479a87f5200"), "a" : "a3", "b" : "b2", "c" : "c3", "__key" : "a3;b3;c3" }
{ "_id" : ObjectId("5ae34b6c5daab479a87f5201"), "a" : "a8", "b" : "b8", "c" : "c9", "__key" : "default-domain:vn9;ch9" }
> 

我已将索引设置如下:

db.test.createIndex({__key: "text"})

现在,我想使用default-domain:*c8

的键搜索字符串
> db.test.find({$text: {$search: "/default-domain:*c8/"}})
{ "_id" : ObjectId("5ae3494a5daab479a87f51fb"), "a" : "a6", "b" : "b6", "c" : "c6", "__key" : "default-domain:admin:vn1;c8" }
{ "_id" : ObjectId("5ae34b6c5daab479a87f5201"), "a" : "a8", "b" : "b8", "c" : "c9", "__key" : "default-domain:vn9;ch9" }
{ "_id" : ObjectId("5ae349645daab479a87f51fc"), "a" : "a7", "b" : "b7", "c" : "c7", "__key" : "default-domain:admin:vn2;c9" }
> 

所以它返回错误的数据,我只期待返回

{ "_id" : ObjectId("5ae3494a5daab479a87f51fb"), "a" : "a6", "b" : "b6", "c" : "c6", "__key" : "default-domain:admin:vn1;c8" }

我从explain()

中看到
    "winningPlan" : {
        "stage" : "TEXT",
        "indexPrefix" : {

        },
        "indexName" : "__key_text",
        "parsedTextQuery" : {
            "terms" : [
                "c8",
                "default",
                "domain"
            ],
            "negatedTerms" : [ ],
            "phrases" : [ ],
            "negatedPhrases" : [ ]
        },

所以在这里,它在内部被转换为3个单词:

            "terms" : [
                "c8",
                "default",
                "domain"
            ],

我认为这就是它返回错误数据的原因。

那么,我如何使用基于文本的索引来实现这一目标:db.test.find({$text: {$search: "??"}}) 搜索表达式是错误的吗?

关于, -M -

1 个答案:

答案 0 :(得分:0)

文本索引的行为符合预期,因为它标记并阻止索引中的术语。这解释了为什么搜索术语在解释计划中分为三个单独的单词。

请参阅https://docs.mongodb.com/manual/core/index-text/#tokenization-delimiters进行标记化,https://docs.mongodb.com/manual/core/index-text/#index-entries查看词干和停止词语。

如果在查询“c8”时需要“default-domain”,那么您可能希望考虑区分大小写的前缀表达式https://docs.mongodb.com/manual/reference/operator/query/regex/#index-use并使用“$”来捕获正则表达式末尾的“c8”http://grainge.org/pages/authoring/regex/regular_expressions.htm

或者,您可以解析“_key”字段中的值,以存储相关数据并直接查询必要的值。