Question

我有一个Mongo集合，其中包含1691721个项目，主要包含位置信息。我试图对此进行正则表达式搜索，而且速度很慢 - 但我不明白为什么，因为我认为我已经有适当的索引了。

典型文件

{
    "_id" : ObjectId("58c08029ef4468c8157455fa"),
    "ng" : [
        394235,
        806529
    ],
    "postcode" : "AB101AB"
}

索引

我在postcode字段上创建了一个文本索引，您可以在完整的索引列表中看到：

db.locations.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "Traders.locations"
        },
        {
                "v" : 2,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "name" : "postcode_text",
                "ns" : "Traders.locations",
                "weights" : {
                        "postcode" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 3
        }
]

查询

此时，我关心的只是postcode字段。所以我尝试编写查询以获取最后一个值：

db.locations.find({ postcode: { $regex: /^ZE29XN$/ } }, { postcode: 1, _id: 0 })

现在需要一段时间才能运行，大约需要700毫秒，这比我预期的要长很多。就我而言，这是一个有问题的查询，我在我关心的单个字段上有一个文本索引。但是，如果我解释上述查询，则表明它使用了 COLLSCAN ，但我不明白为什么：

db.locations.find({ postcode: { $regex: /^ZE29XN$/ } }, { postcode: 1, _id: 0 }).explain("allPlansExecution")
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "Traders.locations",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "postcode" : {
                                "$regex" : "^ZE29XN$"
                        }
                },
                "winningPlan" : {
                        "stage" : "PROJECTION",
                        "transformBy" : {
                                "postcode" : 1,
                                "_id" : 0
                        },
                        "inputStage" : {
                                "stage" : "COLLSCAN",
                                "filter" : {
                                        "postcode" : {
                                                "$regex" : "^ZE29XN$"
                                        }
                                },
                                "direction" : "forward"
                        }
                },
                "rejectedPlans" : [ ]
        },
        "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 1,
                "executionTimeMillis" : 732,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 1691721,
                "executionStages" : {
                        "stage" : "PROJECTION",
                        "nReturned" : 1,
                        "executionTimeMillisEstimate" : 697,
                        "works" : 1691723,
                        "advanced" : 1,
                        "needTime" : 1691721,
                        "needYield" : 0,
                        "saveState" : 13223,
                        "restoreState" : 13223,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "transformBy" : {
                                "postcode" : 1,
                                "_id" : 0
                        },
                        "inputStage" : {
                                "stage" : "COLLSCAN",
                                "filter" : {
                                        "postcode" : {
                                                "$regex" : "^ZE29XN$"
                                        }
                                },
                                "nReturned" : 1,
                                "executionTimeMillisEstimate" : 676,
                                "works" : 1691723,
                                "advanced" : 1,
                                "needTime" : 1691721,
                                "needYield" : 0,
                                "saveState" : 13223,
                                "restoreState" : 13223,
                                "isEOF" : 1,
                                "invalidates" : 0,
                                "direction" : "forward",
                                "docsExamined" : 1691721
                        }
                },
                "allPlansExecution" : [ ]
        },
        "serverInfo" : {
                "host" : "DESKTOP",
                "port" : 27017,
                "version" : "3.4.2",
                "gitVersion" : "3f76e40c105fc223b3e5aac3e20dcd026b83b38b"
        },
        "ok" : 1
}

我的问题

为什么我创建的文本索引没有被使用，最终如何才能让我的查询更快？

我应该注意，我可以选择使用$regex，但我确实需要能够允许＆＃34;以＆＃34; - 所以ZE.*或ZE2.*或ZE29XN都应该可以快速搜索。

值得注意的是，我想知道当我最终使索引工作时，将其标记为unique: true可能有助于加快速度。然而，运行它会产生一个重复的键错误（尽管我在运行聚合时无法找到它 - 如果需要可以深入研究，但我不确定它是否相关）。

Answer 1

MongoDB中文本搜索运算符的快速摘要：

$regex：为查询中的模式匹配字符串提供正则表达式功能。 $regex运算符确实支持部分匹配，但只有在搜索字符串被锚定时才会被索引覆盖（即使用前导^）。
$text：对使用text index索引的字段的内容执行文本搜索（使用$text运算符是MongoDB使用文本的必要前提条件指数）。这些搜索通常是“快速”（主观术语，但是当你有一个工作时，你会看到这意味着什么），但它们不支持部分匹配，因此你将无法对部分邮政编码进行“文本搜索”。

考虑到这一点，您似乎试图对文本索引使用部分匹配（通过$regex）。这不起作用，因为文本索引仅用于$text运算符。

您声明的要求是：

您想要部分字符串匹配
您想索引报道

您可以通过以下方法满足这些要求：（1）使用$regex和（2）索引（普通索引而非文本索引）{{1 }}字段。这是一个（重要的！）警告：您的搜索字符串必须锚定。所以这个要求：“需要能够允许”以“开头” - 所以ZE。*或ZE2。*或ZE29XN应该没问题。但是，诸如postcode之类的搜索将不会被索引覆盖。

为什么MongoDB不使用我的Text索引？

典型文件

索引

查询

我的问题

1 个答案: