And old slash escaping bug left us with some messed up data, like so:
{
suggestions: [
"ok",
"not ok /////////// ... 10s of KBs of this ... //////",
]
}
I would like to just pull those bad values out of the array. My first idea was to $pull
based on a regex that matches 4 "/" characters, but it appears that regexes to not work on large strings:
db.notes.count({suggestions: /\/\/\/\//}) // returns 0
db.notes.count({suggestions: {$regex: "////"}}) // returns 0
My next idea was to use a $where
query to find documents that have suggestion
strings that are longer than 1000. That query works:
db.notes.count({
suggestions: {$exists: true},
$where: function() {
return !!this.suggestions.filter(function (item) {
return (item || "").length > 1000;
}).length
}
})
// returns a plausible number
But a $where
query can't be used as the condition in a $pull
update.
db.notes.update({
suggestions: {$exists: true},
}, {
$pull: {
suggestions: {
$where: function() {
return !!this.suggestions.filter(function (item) {
return (item || "").length > 1000;
}).length
}
}
}
})
throws
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 81,
"errmsg" : "no context for parsing $where"
}
})
I'm running out of ideas. Will I have to iterate over the entire collection, and $set: {suggestions: suggestions.filter(...)}
for each document individually? Is there no better way to clean bad values out of an array of large strings in MongoDB?
(I'm only adding the "javascript" tag to get SO to format the code correctly)
答案 0 :(得分:0)
问题评论中指出的简单解决方案应该有效。它与测试用例一起工作,这是对原始问题的重新创建。正则表达式可以匹配大字符串,没有特殊限制。
db.notes.updateOne({suggestions: /\/\//}, { "$pull": {suggestions: /\/\//}})
由于这对我没有用,我最后讨论了所讨论的问题:通过根据字符串长度过滤数组元素来单独更新所有文档:
db.notes.find({
suggestions: {$exists: true}
}).forEach(function(doc) {
doc.suggestions = doc.suggestions.filter(function(item) {
return (item || "").length <= 1000;
}); db.notes.save(doc);
});
它运行缓慢,但在这种情况下这不是一个真正的问题。