我有一个包含导入的平面文件CSV的Mongo数据库。在SQL中,毫无疑问,该文件应该被标准化:文件每个句点包含一行,并且句点包含重复的信息。我创建了一个使用' push'运算符将重复信息聚合(部分)到行内的单个子对象中。这模仿了规范化。我想要做的是重构输出对象,以便子对象字典使用顶层的键和值。在SQL中,这称为Pivot查询或交叉表查询。在Excel中,它被称为转置。无论名称如何,我所寻找的是能够获取键值对并将它们用作“列”的列表'在蒙戈。
由于Mongo和其他NoSQL数据库的目标是非规范化实现,我很惊讶这很难。
我试图将以下JSON对象放入Mongo:
[{ "_id": {"Date": "1/1/2018", "Type": "Green", "client_id": 1},
"Sub_data": [{"sub_id" : 1}, {"sub_value": 2}] },
{ "_id": {"Date": "1/1/2018", "Type": "Green", "client_id": 1},
"Sub_data": [{"sub_id" : 2}, {"sub_value": 5}] },
{ "_id": {"Date": "1/2/2018", "Type": "Green", "client_id": 1},
"Sub_data": [{"sub_id" : 2}, {"sub_value": 4}] },
{ "_id": {"Date": "1/1/2018", "Type": "Orange", "client_id": 1},
"Sub_data": [{"sub_id" : 6}, {"sub_value": 7}] }]
并得到以下内容:
[{ "_id": {"Date": "1/1/2018", "Type": "Green", "client_id": 1},
"1" : 2, "2":5},
{ "_id": {"Date": "1/2/2018", "Type": "Green", "client_id": 1},
"2" : 4},
{ "_id": {"Date": "1/2/2018", "Type": "Orange", "client_id": 1},
"6" : 7}]
请注意,我希望此结果具有任意数量的列。我已经查看了SEEM解决问题的一些解决方案(Array to object,AddFields,ReplaceRoot,Something like a pivot using static columns)并且我已阅读{{3}后处理是唯一的方法吗?
注意:这是尝试模仿multiple versions of this 'do it afterwards' code.和in this Stack Overflow question所描述的SQL服务器(和Excel等)功能。
累积起来,使用第一个答案第二个选项的总管道如下所示:
db.rate_cards.aggregate(
{
"$group": {
"_id": {
"date": "$date",
"start_date": "$start_date",
"end_date": "$end_date"
},
"code_data": {
"$push": {
"code_str": {"$substr" : ["$code",0,-1]},
"cpm": "$cpm"
}
}
}
},
{
"$group":{
"_id":"$_id",
"data":{
"$mergeObjects":{
"$arrayToObject":[[
{
"k":{"$let":{"vars":{"sub_id_elem":{"$arrayElemAt":["$code_data",0]}},"in":"$$sub_id_elem.code_str"}},
"v":{"$let":{"vars":{"sub_value_elem":{"$arrayElemAt":["$code_data",1]}},"in":"$$sub_value_elem.cpm"}}
}
]]
}
}
}
},
{"$replaceRoot":{"newRoot":{"$mergeObjects":["$_id",{"$arrayToObject":"$data"}]}}}
)
请注意,这比我希望的更复杂,性能更强。它似乎声明了一个局部变量,使用了一个子句,等等。试图运行(工作)实现两个答案NoSQL助推器扼杀试图扩展第600行' ish。
下面是原始数据集的略微编辑版本。请注意,原始查询中没有使用一些额外的字段,它们已被省略:
{
"_id" : ObjectId("5a578d5c57d33b197004beed"),
"date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"start_date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"end_date" : ISODate("2017-10-01T03:00:00.000+03:00"),
"dp" : "M-Su 12m-6a",
"dsc" : "Daypart",
"net" : "val1",
"place" : "loc1",
"code" : 12,
"cost" : 16.8
},
{
"_id" : ObjectId("5a578d5c57d33b197004beee"),
"date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"start_date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"end_date" : ISODate("2017-10-01T03:00:00.000+03:00"),
"dp" : "M-Su 12m-6a",
"dsc" : "Daypart",
"net" : "val1",
"place" : "loc3",
"code" : 24,
"cost" : 55.6
},
{
"_id" : ObjectId("5a578d5c57d33b197004beef"),
"date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"start_date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"end_date" : ISODate("2017-10-01T03:00:00.000+03:00"),
"dp" : "M-Su 12n-6p",
"dsc" : "Daypart",
"net" : "val2",
"place" : "loc2",
"code" : 23,
"cost" : 65.5
},
{
"_id" : ObjectId("5a578d5c57d33b197004bef0"),
"date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"start_date" : ISODate("2017-09-25T03:00:00.000+03:00"),
"end_date" : ISODate("2017-10-01T03:00:00.000+03:00"),
"dp" : "M-Su 6p-12m",
"dsc" : "Daypart",
"net" : "val2",
"place" : "loc2",
"code" : 23,
"cost" : 101
}
答案 0 :(得分:2)
好的,根据帖子和评论中提供的信息,我创建了以下数据集。
注意:我做了几处修改。所有这些都在评论中注明。
更改_id以读取数据库中的my_id,因为_id字段名称是保留的并且是唯一索引的。
更改“sub_id”以将值存储为字符串类型。
db.test.insert(
[
{ "my_id": {"Date": "1/1/2018", "Type": "Green", "client_id": 1},
"Sub_data": [{"sub_id" : "1"}, {"sub_value": 2}] },
{ "my_id": {"Date": "1/1/2018", "Type": "Green", "client_id": 1},
"Sub_data": [{"sub_id" : "2"}, {"sub_value": 5}] },
{ "my_id": {"Date": "1/2/2018", "Type": "Green", "client_id": 1},
"Sub_data": [{"sub_id" : "2"}, {"sub_value": 4}] },
{ "my_id": {"Date": "1/1/2018", "Type": "Orange", "client_id": 1},
"Sub_data": [{"sub_id" : "6"}, {"sub_value": 7}] }
])
您需要使用$group
和$arrayToObject
输出预期的格式。
$group
的 $push
来推送子数据中的所有值,并将第一个元素映射到键,将第二个元素映射到值,然后$arrayToObject
格式化为指定的键值。< / p>
$mergeObjects
将_id与其余值合并。 $replaceRoot
将合并后的文档提升到最高级别。
db.test.aggregate([
{"$group":{
"_id":"$my_id",
"data":{
"$push":{
"k":{"$let":{"vars":{"sub_id_elem":{"$arrayElemAt":["$Sub_data",0]}},"in":"$$sub_id_elem.sub_id"}},
"v":{"$let":{"vars":{"sub_value_elem":{"$arrayElemAt":["$Sub_data",1]}},"in":"$$sub_value_elem.sub_value"}}
}
}
}},
{"$replaceRoot":{"newRoot":{"$mergeObjects":["$_id",{"$arrayToObject":"$data"}]}}}
])
输出:
{Date:"1/2/2018", "Type":"Orange", "client_id": 1", "6":7}
{Date:"1/1/2018", "Type":"Green", "client_id": 1", "2":4}
{Date:"1/2/2018", "Type":"Green", "client_id": 1", "1":2, "2":5}
或者,您可以使用$mergeObjects
作为累加器来合并对象。
db.test.aggregate([
{"$group":{
"_id":"$my_id","data":{
"$mergeObjects":{
"$arrayToObject":[[
{
"k":{"$let":{"vars":{"sub_id_elem":{"$arrayElemAt":["$Sub_data",0]}},"in":"$$sub_id_elem.sub_id"}},
"v":{"$let":{"vars":{"sub_value_elem":{"$arrayElemAt":["$Sub_data",1]}},"in":"$$sub_value_elem.sub_value"}}
}
]]
}
}
}},
{"$replaceRoot":{"newRoot":{"$mergeObjects":["$_id","$data"]}}}
])