在我的mongodb中,我收藏的很少,我想通过使用pymongo比较集合1和集合2来创建一个新的集合。
Collection 1 :
Object id timestamp Prof_Name SUBJECT
abc67478898k ISODate("2018-01-03T09:26:37.541Z") ABDC "sub1, sub2, sub3"
jjjjjjjjjj ISODate("2018-01-03T09:26:37.541Z") XYZ "sub2, sub4, sub8"
Collection 2 :
Object id timestamp UUID SUBJECT_ID rating score
3333333 ISODate("2018-01-03TZ") 7897 "sub1,sub4, sub7" 7 10
444444 ISODate("2018-01-03TZ") 4532 "sub2" 4 6
777777 ISODate("2018-01-03TZ") 7876 "sub1,sub2,sub3" 8 8
1111111 ISODate("2018-01-03TZ") 654 "sub1,sub3" 7 8
Json如下:
data1 :
{ "_id" : ObjectId("7a563a5a5560fd08da86dc44"), "Prof_Name" : "Jack", "timestamp" : ISODate("2018-01-10T16:08:26.613Z"), "SUBJECT" : ["Maths", "Chemistry", "Machinery1", "Ele1"] }
{ "_id" : ObjectId("7a563a5a5560fd08da86dc45"), "Prof_Name" : "Mac", "timestamp" : ISODate("2018-01-10T16:08:26.613Z"), "SUBJECT" : ["Chemistry", "CS", "German"] }
{ "_id" : ObjectId("7a563a5a5560fd08da86dc46"), "Prof_Name" : "Bill", "timestamp" : ISODate("2018-01-10T16:08:26.613Z"), "SUBJECT" : ["German"] }
data2 :
{ "_id" : ObjectId("7a563a5a5560fd08da86dc46"), "Rating" : 6, "UUID" : 8123, "timestamp" : ISODate("2018-01-10T16:08:26.613Z"), "SUBJECT_ID" : "Maths", "ID" : "OI-123" }
{ "_id" : ObjectId("7a563a5a5560fd08da86dc47"), "Rating" : 7, "UUID" : 8123, "timestamp" : ISODate("2018-01-10T16:08:26.613Z"), "SUBJECT_ID" : "Machinery1, Maths, French, German", "ID" : "OI-98" }
我尝试生成第3个集合,其中Prof_name的每个主题在collection2中找到匹配的主题,在某个时间戳和我的mongo查询之间找到UUID和UUID_count如下:
db.data1.aggregate([
{"$lookup":{
"from":"data2",
"let":{"subject":{"$split":["$SUBJECT",", "]}},
"pipeline":[
{"$match": {"expr":{"$and":[{"$eq":[{"$year":"$timestamp"}, 2016]}, {"$eq":[{"$month":"$timestamp"}, 1]}]}}},
{"$addFields":{"SUBJECT_ID":{"$split":["$SUBJECT_ID",", "]},"SUBJECT":"$$subject"}},
{"$unwind":"$SUBJECT"},
{"$match":{"$expr":{"$in":["$SUBJECT","$SUBJECT_ID"]}}},
{"$facet":{
"UUID":[{"$group":{"_id":{"id":"$_id","UUID":"$UUID"}}},{"$count":"UUID_Count"}],
"REST":[
{"$group":{"_id":null,"subjects_list":{"$addToSet":"$SUBJECT"},"UUID_distinct_list":{"$addToSet":"$UUID"}}},
{"$addFields":{"subject_count":{"$size":"$subjects_list"},"UUID_distinct_count":{"$size":"$UUID_distinct_list"}}},
{"$project":{"_id":0}}
]
}},
{"$replaceRoot":{"newRoot":{"$mergeObjects":[{"$arrayElemAt":["$UUID",0]},{"$arrayElemAt":["$REST",0]}]}}}
],
"as":"ref_data"
}},
{"$unwind":{"path":"$ref_data","preserveNullAndEmptyArrays":true}},
{"$addFields":{"ref_data.Prof_Name":"$Prof_Name"}},
{"$replaceRoot":{"newRoot":"$ref_data"}},
{"$out":"data3"}
])
如果SUBJECT是一个字符串,则abov查询可以正常工作:
SUBJECT
"sub1, sub2, sub3"
"sub2, sub4, sub8"
我的问题是:如果将SUBJECT列作为元素数组,如何更改查询。示例如下:
subjects1
["sub1", "sub2", "sub3"]
["sub2", "sub4", "sub8"]
如果我尝试相同的查询,我会收到类似的错误,在字符串上找到一个数组。