我具有以下数据结构
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EMP_ID ¦ ATTR1_OLD_VAL ¦ ATTR1_NEW_VAL ¦ ATTR2_OLD_VAL ¦ ATTR2_NEW_VAL ¦ ATTR3_OLD_VAL ¦ ATTR3_NEW_VAL ¦
¦-----------+------------+---------------+---------------+---------------+---------------+---------------+-------------------------------------------------------------------------------------------¦
¦ E001 ¦ xyz ¦ [{"codeTs":"12345567 ","goodsAttrName":"test1"},{"codeTs":"6402910000","goodsAttrName":"test2"}] ¦ mmm ¦ nnn ¦ zzz ¦ aaa ¦
E002 ¦ 1234 ¦ ¦ 123 ¦ jjj ¦ iii ¦ bb ¦
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
我想将其处理为以下数据
+---------------------------------------------------------------------------------------------------------------------------------------------+
EMP_ID ¦ ATTR1_OLD_VAL ¦ codeTs ¦ goodsAttrName ¦ATTR1_NEW_VAL ¦ ATTR2_OLD_VAL ¦ ATTR2_NEW_VAL ¦ ATTR3_OLD_VAL ¦ ATTR3_NEW_VAL ¦
¦-----------+------------+---------------+---------------+---------------+---------------+---------------+------------------------------------¦
¦ E001 ¦ xyz ¦ 12345567 ¦ test1 ¦ ¦ mmm ¦ nnn ¦ zzz ¦ aaa ¦
¦ E001 ¦ xyz ¦ 6402910000 ¦ test2 ¦ ¦ mmm ¦ nnn ¦ zzz ¦ aaa ¦
¦ E002 ¦ 1234 ¦ ¦ ¦ ¦ 123 ¦ jjj ¦ iii ¦ bb ¦
+---------------------------------------------------------------------------------------------------------------------------------------------+
我尝试了以下代码不起作用
import pymongo as pm
import pandas as pd
import numpy as np
import datetime as dt
from bson import json_util
from pandas.io.json import json_normalize
import json
client = pm.MongoClient('mongodb://user1:user1@127.0.0.1:27017')
db = client['my_db']
mongo_data = list(db['ATTR1_NEW_VAL'].find({}))
sanitized = json.loads(json_util.dumps(mongo_data))
normalized = json_normalize(sanitized)
df = pd.DataFrame(normalized)
print(df)
答案 0 :(得分:0)
尝试一下:
首先使用df.explode
将字典列表分解为行,然后通过apply(pd.Series)
将字典值转换为列
df = df.explode('ATTR1_NEW_VAL')
print(pd.concat([df.drop(['ATTR1_NEW_VAL'], axis=1), df['ATTR1_NEW_VAL'].apply(pd.Series)], axis=1))