Question

我需要提取与正则表达式匹配的字符串的一部分并将其返回。我有像mongodb这样的一组文件

{"_id" :12121, "fileName" : "apple.doc"}, 
{"_id" :12125, "fileName" : "rap.txt"},
{"_id" :12126, "fileName" : "tap.pdf"}, 
{"_id" :12126, "fileName" : "cricket.txt"},

我需要从中提取所有的fileExtensions并返回{“。doc”，“。txt”，“。pdf”}

我正在尝试使用$ regex运算符来查找子字符串并对结果进行聚合，但无法提取所需的部分并将其传递到管道线。

非常感谢任何帮助。

非常感谢。

我尝试过这样的事情

aggregate([ {     $match:{"name":{$regex:'/\.[0-9a-z]+$/i', "$options" : "i"}} }, {     $group:{         _id:null,         tot:{$push:"$name"}     } }])

Answer 1

在聚合管道中执行此操作几乎无法撤消，您希望投射匹配并仅包含句点之后的部分。没有（尚）操作员来定位期间的位置。您需要该位置，因为$ substr（https://docs.mongodb.com/manual/reference/operator/aggregation/substr/）需要一个起始位置。另外$ regEx仅用于匹配，你不能在投影中使用它来替换。

我认为现在在代码中更容易做到这一点。在这里，您可以使用替换正则表达式或您的语言提供的任何其他解决方案

Answer 2

使用聚合框架和$indexOfCP运算符，可以在即将发布的MongoDB版本中（截至本文撰写时）。在此之前，您最好的选择是MapReduce。

var mapper = function() { 
    emit(this._id, this.fileName.substring(this.fileName.indexOf(".")))
};

db.coll.mapReduce(mapper, 
                  function(key, value) {}, 
                  { "out": { "inline": 1 }}
)["results"]

哪个收益率：

[
    {
        "_id" : 12121,
        "value" : ".doc"
    },
    {
        "_id" : 12125,
        "value" : ".txt"
    },
    {
        "_id" : 12126,
        "value" : ".pdf"
    },
    {
        "_id" : 12127,
        "value" : ".txt"
    }
]

对于completness，这里是使用聚合框架^*

的解决方案

db.coll.aggregate(
    [
        { "$match": { "name": /\.[0-9a-z]+$/i } },
        { "$group": { 
            "_id": null,
            "extension":  { 
                "$push": {
                    "$substr": [ 
                        "$fileName", 
                        { "$indexOfCP": [ "$fileName", "." ] }, 
                        -1 
                    ]
                }
            }
        }}
    ])

产生：

{ 
    "_id" : null, 
    "extensions" : [ ".doc", ".txt", ".pdf", ".txt" ] 
}

_{* MongoDB的当前开发版本（截至撰写本文时）。}

Answer 3

从Mongo 4.2开始，$regexFind聚合运算符使事情变得更容易：

// { _id : 12121, fileName: "apple.doc" }
// { _id : 12125, fileName: "rap.txt" }
// { _id : 12126, fileName: "tap.pdf" }
// { _id : 12127, fileName: "cricket.txt" }
// { _id : 12129, fileName: "oups" }
db.collection.aggregate([
  { $set: { ext: { $regexFind: { input: "$fileName", regex: /\.\w+$/ } } } },
  { $group: { _id: null, extensions: { $addToSet: "$ext.match" } } }
])
// { _id: null, extensions: [ ".doc", ".pdf", ".txt" ] }

这使我们成为：

$set运算符，它为每个文档添加一个新字段。
此新字段（ext）是$regexFind运算符的结果，该运算符捕获匹配的正则表达式的结果。如果找到匹配项，它将返回一个包含有关 first 匹配项的信息的文档。如果找不到匹配项，则返回null:。例如：
- 对于{ fileName: "tap.pdf" }，它产生{ matches: { match: ".doc", idx: 5, captures: [] }。
- 对于{ fileName: "oups" }，它产生{ matches: null }。
最后，使用$group阶段，再结合"match"字段上的$addToSet，我们可以生成不同扩展名的列表。

使用正则表达式

3 个答案: