我在熊猫数据框中有一个看起来像这样的列:
col1 list_of_dictionaries
1 [{'id': 1,'tid': 1,'measure': 'time','i_id': 0,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 2,'tid': 2,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 3,'tid': 3,'measure': 'time','i_id': 2,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 4,'tid': 4,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z','status': {'calendar': 0, 'business': 0}}]
如何在同一数据框中展平字典列表,使其看起来像这样?
col1 id tid measure i_id type time status.calendar status.business
1 1 1 time 0 time 2000-06-19T05:08:11Z 0 0
1 2 2 time 1 time 2000-06-19T05:08:11Z 0 0
1 3 3 time 2 time 2000-06-19T05:08:11Z 0 0
1 4 4 time 1 time 2000-06-19T05:08:11Z 0 0
我想保留原始数据并在其中扩展,同时每次重复列名都创建更多行。
我在列上尝试了json_normalize,但出现错误:
AttributeError: 'str' object has no attribute 'values'
编辑:
x is a tuple according to spyder:
[
{
'
i
d
'
:
答案 0 :(得分:1)
您可以取消嵌套纯Python,然后使用json_normalize
ids, x = zip(*[(id_, value) for id_, sub in zip(df['col1'], df.lod.values.tolist())\
for value in sub])
ndf = pd.io.json.json_normalize(x)
答案 1 :(得分:1)
这是一种实现方法:
df = pd.DataFrame([{"tt":[{'id': 1,'tid': 1,'measure': 'time','i_id': 0,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 2,'tid': 2,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 3,'tid': 3,'measure': 'time','i_id': 2,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 4,'tid': 4,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z','status': {'calendar': 0, 'business': 0}}], "col1":0}, {"tt":[{'id': 5,'tid': 1,'measure': 'time','i_id': 0,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 6,'tid': 2,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 7,'tid': 3,'measure': 'time','i_id': 2,'type': 'time','time': '2000-06-19T05:08:11Z'},{'id': 8,'tid': 4,'measure': 'time','i_id': 1,'type': 'time','time': '2000-06-19T05:08:11Z','status': {'calendar': 0, 'business': 0}}], "col1":1}])
res = df["tt"].values
# Add all the appropriate column values to dicts
for i, elem in enumerate(res):
for dic in elem:
dic["col1"]=df.iloc[i]["col1"].copy()
# Concatenate all so no need to append to DataFrame, append is slow
store = []
for x in res:
store.extend(x)
# Now use normalize to expand and create the Dataframe
df2 = pd.io.json.json_normalize(store)
# Some fluff, if you care
df2.fillna(0, inplace=True)
for col in ["status.business", "status.calendar"]:
df2[col] = df2[col].astype(int, copy=False)
print(df2)
输出:
col1 i_id id measure status.business status.calendar tid time type
0 0 0 1 time 0 0 1 2000-06-19T05:08:11Z time
1 0 1 2 time 0 0 2 2000-06-19T05:08:11Z time
2 0 2 3 time 0 0 3 2000-06-19T05:08:11Z time
3 0 1 4 time 0 0 4 2000-06-19T05:08:11Z time
4 1 0 5 time 0 0 1 2000-06-19T05:08:11Z time
5 1 1 6 time 0 0 2 2000-06-19T05:08:11Z time
6 1 2 7 time 0 0 3 2000-06-19T05:08:11Z time
7 1 1 8 time 0 0 4 2000-06-19T05:08:11Z time