我有一个mongoDB集合,其中包含类似这样的文档
doc = {
"_id": {
"$oid": "516622c9ce21150200000d87"
},
"SubmissionDate": {
"$date": "2013-04-11T02:41:13.162Z"
},
"isComplete": True,
"Rounds": [
{
"Photo": [
],
"A": {
"Complexity": 55,
"Colour": 85,
"Deep": 51,
"Effervescence": 44
},
"B": {
"QualityPIDs": [
],
"QualityScales": [
],
"Complexity": 43,
"Qualities": [
]
},
"C": {
"QualityPIDs": [
],
"QualityScales": [
],
"Complexity": 60,
"UHS": 46,
"Colour": 33,
"Qualities": [
]
},
"D": {
"Complexity": 73,
"Duration": 68,
"Quality": 65
}
}
],
"Item": {
"_id": {
"$oid": "51e6d678c06918db21156f92"
},
"Country": "Australia",
"Name": "King",
"PeopleId": {
"$oid": "51dddb69a9d9350200000"
},
"Style": "Apple",
"Type": "Flat",
"UserSubmitted": False
}
}
我需要将此集合转换为熊猫数据框。
此处建议的解决方案How to import data from mongodb to pandas? 做主要工作。但是我还有 Rounds 列,其中包含词典的字典。
我做了一组循环,以访问 Rounds
的子词典df = pd.json_normalize(doc)
A_data = pd.DataFrame(columns=df.Rounds[0][0]['A'].keys())
for i in range(len(df.Rounds)):
A_data = A_data.append(pd.json_normalize(df.Rounds[0][0]['A']), ignore_index=True)
最后,我将A_data连接到主数据框架。
有更快的方法吗?现在循环需要很多时间。谢谢!
答案 0 :(得分:1)
dict
参数指定mata
的每个级别,并为'Rounds'
使用record_path
。import pandas as pd
meta = [['_id', '$oid'],
['Item', 'Country'],
['Item', 'Name'],
['Item', 'Style'],
['Item', 'Type'],
['Item', 'UserSubmitted'],
['Item', '_id', '$oid'],
['Item', 'PeopleId', '$oid'],
['SubmissionDate', '$date'],
'isComplete']
df = pd.json_normalize(doc, record_path='Rounds', meta=meta)
# display(df)
Photo A.Complexity A.Colour A.Deep A.Effervescence B.QualityPIDs B.QualityScales B.Complexity B.Qualities C.QualityPIDs C.QualityScales C.Complexity C.UHS C.Colour C.Qualities D.Complexity D.Duration D.Quality _id.$oid Item.Country Item.Name Item.Style Item.Type Item.UserSubmitted Item._id.$oid Item.PeopleId.$oid SubmissionDate.$date isComplete
0 [] 55 85 51 44 [] [] 43 [] [] [] 60 46 33 [] 73 68 65 516622c9ce21150200000d87 Australia King Apple Flat False 51e6d678c06918db21156f92 51dddb69a9d9350200000 2013-04-11T02:41:13.162Z True