Question

我有一个看起来像这样的收藏集：

{
    "strings" : [
        "Hello world",
        "Error connecting to server",
        "Dbus error"
    ],
    "file" : "test.c"
}
{
    "strings" : [
        "Could not configure supporting library.",
        "Assertion failed",
        "Error in server response"
    ],
    "file" : "run.c"
}
...

大约有25万个此类文档，每个文档都有自己的字符串数组和唯一的文件名。在查询时，我有一个字符串，它可以是其中一个文档的字符串之一的子字符串，或者可以包含某些文档中的某些字符串（即，它永远不会完全匹配任何一个字符串）来自任何文档）。例子：

Hello world hi there             // Should match test.c
Error connecting                 // test.c
Error connecting to server: 107  // test.c
Assertion failed: Dbus error     // Should match both test.c and run.c

我需要检索其“字符串”字段中包含与我的查询字符串最匹配的字符串的文档。

我尝试了索引文本搜索：

db.testcoll.find( { $text: { $search: "Could not configure supporting library." } } , {"file": 1, "_id": 0, score: {$meta: "textScore"}}).sort({score: {$meta: "textScore"}})

但是，与此有关的问题是，可能存在一个文档，其中包含多个与我的查询字符串部分匹配的字符串，因此与其他文档相比，具有最接近的匹配字符串的文档的匹配得分仍然较低。

我尝试展开聚合中的“字符串”字段，以便每个文档仅包含一个字符串，并且我希望最接近的字符串能够获得最高分。但是，如果mongodb是流水线的第一个阶段，则仅允许$match具有$text，因此我无法在其之前进行放松。

db.testcoll.aggregate([{$project: {file: 1, strings: 1}}, {$unwind: "$strings"}, {$match: { $text: { $search: "Could not configure supporting library." } }}])

输出：

2019-05-27T11:36:56.671+0530 E QUERY    [js] Error: command failed: {
    "ok" : 0,
    "errmsg" : "$match with $text is only allowed as the first pipeline stage",
    "code" : 17313,
    "codeName" : "Location17313"
} : aggregate failed

是否有一种方法可以对unwind返回的文档执行文本搜索？如果没有，是否有任何方法可以在mongodb中执行这种文本搜索？预先感谢您的帮助。

在MongoDB中对未展开的文档运行文本搜索

0 个答案: