How to $pull elements from an array, $where elements' string length > a large number?

时间:2016-04-04 17:05:13

标签: javascript mongodb

And old slash escaping bug left us with some messed up data, like so:

{
    suggestions: [
        "ok",
        "not ok /////////// ... 10s of KBs of this ... //////",
    ]
}

I would like to just pull those bad values out of the array. My first idea was to $pull based on a regex that matches 4 "/" characters, but it appears that regexes to not work on large strings:

db.notes.count({suggestions: /\/\/\/\//}) // returns 0
db.notes.count({suggestions: {$regex: "////"}}) // returns 0

My next idea was to use a $where query to find documents that have suggestion strings that are longer than 1000. That query works:

db.notes.count({
    suggestions: {$exists: true},
    $where: function() {
        return !!this.suggestions.filter(function (item) {
            return (item || "").length > 1000;
        }).length
    }
})
// returns a plausible number

But a $where query can't be used as the condition in a $pull update.

db.notes.update({
    suggestions: {$exists: true},
}, {
    $pull: {
        suggestions: {
            $where: function() {
                return !!this.suggestions.filter(function (item) {
                    return (item || "").length > 1000;
                }).length
            }
        }
    }
})

throws

WriteResult({
    "nMatched" : 0,
    "nUpserted" : 0,
    "nModified" : 0,
    "writeError" : {
        "code" : 81,
        "errmsg" : "no context for parsing $where"
    }
})

I'm running out of ideas. Will I have to iterate over the entire collection, and $set: {suggestions: suggestions.filter(...)} for each document individually? Is there no better way to clean bad values out of an array of large strings in MongoDB?

(I'm only adding the "javascript" tag to get SO to format the code correctly)

1 个答案:

答案 0 :(得分:0)

问题评论中指出的简单解决方案应该有效。它与测试用例一起工作,这是对原始问题的重新创建。正则表达式可以匹配大字符串,没有特殊限制。

db.notes.updateOne({suggestions: /\/\//}, { "$pull": {suggestions: /\/\//}})

由于这对我没有用,我最后讨论了所讨论的问题:通过根据字符串长度过滤数组元素来单独更新所有文档:

db.notes.find({
    suggestions: {$exists: true}
}).forEach(function(doc) {
    doc.suggestions = doc.suggestions.filter(function(item) {
        return (item || "").length <= 1000;
    }); db.notes.save(doc);
});

它运行缓慢,但在这种情况下这不是一个真正的问题。