Question

我的抓取应用程序中有一个大型数组：

 {
"_id" : ObjectId("5538ebbe265a286c54531d8c"),
   "word_frequency" : [
        [
            "words",
            NumberLong(5)
        ],
        [
            "sign",
            NumberLong(5)
        ],
        [
            "facebook",
            NumberLong(4)
        ],
        [
            "enter",
            NumberLong(3)
        ],

        etc more then 100 sub arrays in one _id 
}

数组是多维的，包含其他多个条目对于查询和卷大小透视，存储此类值的最佳模式设计是什么？一个在我的例子中显示它或分割数组，并为每个数组创建一个单独的项目

{
"_id" : ObjectId("5538ebbe265a286c54531d8c"),
"word_frequency" : [
        [
            "words",
            NumberLong(5)
        ]
}


{
"_id" : ObjectId("5538ebbe265a286c54531d8c"),
"word_frequency" : [
        [
            "sign",
            NumberLong(5)
        ],
}

由于

Answer 1

根据我个人的意见，如果你拆分和数组并作为单独的项插入每个数组，它会在集合中占用太多文档，如果你想找出包含var data = new google.visualization.arrayToDataTable( [ ['X', 'Alfa', 'Bravo', 'Charlie', 'Delta', 'Echo', 'Foxtrot'], ['10', 10, 24, 20, 32, 18, 5], ['11', 16, 22, 23, 30, 16, 9], ['12', 28, 19, 29, 30, 12, 13], ['13', 28, 19, 29, 30, 12, 13] ]);的所有word_frequency，那么第二个问题那么你的查询应该是：

sign

它返回结果find({"word_frequency":{"$elemMatch":{"$elemMatch":{"$in":["sign"]}}}})表示像这样的嵌套数组

array of array

在这种情况下，如果你想进一步使用这个返回结果，那么你应该使用一些额外的代码逻辑来分离出嵌套数组，这里增加了一些数组循环，并且处理时间很长。

根据我的建议，您应该重新构建文档，如下所示：

"word_frequency" : [ [ "sign", NumberLong(5) ] ]

在这种情况下，您应该轻松管理您的文档，并使用以下mongo功能，您可以找到如何查询。

1＆GT;如果您想按{ "_id": ObjectId("5538ebbe265a286c54531d8c"), "word_frequency": [{ "type": "words", "count": NumberLong(5) }, { "type": "sign", "count": NumberLong(5) }, { "type": "facebook", "count": NumberLong(4) }, { "type": "enter", "count": NumberLong(3) }] }分组，想要计算type，那么mongo aggregation会对您有所帮助。

2 - ;如果您想在sum,avg,min,max中添加任何其他type，那么mongo $push会对您有所帮助。

3＆GT;如果您想要更新word_frequency，那么mongo $pull会对您有所帮助。

使用这种方式可以创建文档结构。

存储数十亿个大型阵列的最佳架构解决方案是什么

1 个答案: