以下是我正在使用的数据框:
Row |ID | List
----------------------------------------------------------------------------------------------------------------------------------------------------------------
1 |45 | [{u'value': u'0', u'label': u'Forum Thread Size'}, {u'value': u'0', u'label': u'Unique Commenters'}, {u'value': u'0', u'label': u'Likes and Votes'}]
2 |76 | [{u'value': u'1', u'label': u'Forum Thread Size'}, {u'value': u'1', u'label': u'Unique Commenters'}, {u'value': u'1', u'label': u'Engagement'}, {u'value': u'0', u'label': u'Likes and Votes'}]
3 |99 | []
4 |83 | [{u'value': u'0', u'label': u'Forum Thread Size'}, {u'value': u'0', u'label': u'Unique Commenters'}, {u'value': u'0', u'label': u'Likes and Votes'}]
5 |80 | []
我希望数据在转换后在pandas数据框中看起来像这样:
Row |ID |Forum Thread Size |Unique Commenters |Engagement |Likes and Votes
------------------------------------------------------------------------------------------------------------------------------------------------------
1 |45 |0 |0 | |0
2 |76 |1 |1 |1 |0
3 |99 | | | |
4 |83 |0 |0 | |0
5 |80 | | | |
答案 0 :(得分:2)
您可以使用apply
循环遍历List
列,并将每个列表转换为pandas.Series
对象,并将label
作为索引;这将生成一个数据框,其中label
作为列标题,然后您可以concat
使用数据框的其余列来获取所需内容:
df1 = pd.concat([
df.drop('List', 1),
df.List.apply(lambda lst: pd.Series({
d['label']: d['value'] for d in lst
}))
], axis=1)
df1
# Row ID Engagement Forum Thread Size Likes and Votes Unique Commenters
#0 1 45 NaN 0 0 0
#1 2 76 1 1 0 1
#2 3 99 NaN NaN NaN NaN
#3 4 83 NaN 0 0 0
#4 5 80 NaN NaN NaN NaN
答案 1 :(得分:1)
IIUC
df1=df.set_index(['Row','ID']).List.apply(pd.Series).stack().apply(pd.Series).reset_index()
df1.pivot_table(index=['Row','ID'], columns='label', values='value',aggfunc=np.sum).merge(df[['Row','ID']],left_index=True,right_on=['Row','ID'],how='right')
Out[334]:
Engagement Forum Thread Size Likes and Votes Unique Commenters Row ID
0 None 0 0 0 1 1
1 1 1 0 1 2 2
2 NaN NaN NaN NaN 3 3
数据输入:
df = pd.DataFrame({'Row':[1,2,3],'ID':[1,2,3], 'List':[[{u'value': u'0', u'label': u'Forum Thread Size'}, {u'value': u'0', u'label': u'Unique Commenters'}, {u'value': u'0', u'label': u'Likes and Votes'}], [{u'value': u'1', u'label': u'Forum Thread Size'}, {u'value': u'1', u'label': u'Unique Commenters'}, {u'value': u'1', u'label': u'Engagement'}, {u'value': u'0', u'label': u'Likes and Votes'}],[]]})