Python - 如何提取列表列标签&价值与价值针对唯一ID进行转置

时间:2017-09-29 21:05:15

标签: python-2.7 pandas etl

以下是我正在使用的数据框:

Row  |ID   | List
----------------------------------------------------------------------------------------------------------------------------------------------------------------
1    |45   | [{u'value': u'0', u'label': u'Forum Thread Size'}, {u'value': u'0', u'label': u'Unique Commenters'}, {u'value': u'0', u'label': u'Likes and Votes'}]
2    |76   | [{u'value': u'1', u'label': u'Forum Thread Size'}, {u'value': u'1', u'label': u'Unique Commenters'}, {u'value': u'1', u'label': u'Engagement'}, {u'value': u'0', u'label': u'Likes and Votes'}]
3    |99   | []
4    |83   | [{u'value': u'0', u'label': u'Forum Thread Size'}, {u'value': u'0', u'label': u'Unique Commenters'}, {u'value': u'0', u'label': u'Likes and Votes'}]
5    |80   | []

我希望数据在转换后在pandas数据框中看起来像这样:

Row	|ID	|Forum Thread Size	|Unique Commenters	|Engagement	|Likes and Votes
------------------------------------------------------------------------------------------------------------------------------------------------------
1	|45	|0	                |0		        |               |0
2	|76	|1	                |1	                |1	        |0
3	|99	|			|                       |               |
4	|83	|0	                |0		        |               |0
5	|80	|			|                       |               |

2 个答案:

答案 0 :(得分:2)

您可以使用apply循环遍历List列,并将每个列表转换为pandas.Series对象,并将label作为索引;这将生成一个数据框,其中label作为列标题,然后您可以concat使用数据框的其余列来获取所需内容:

df1 = pd.concat([
    df.drop('List', 1), 
    df.List.apply(lambda lst: pd.Series({
       d['label']: d['value'] for d in lst
    }))
], axis=1)
​
df1
# Row   ID  Engagement   Forum Thread Size   Likes and Votes    Unique Commenters
#0  1   45        NaN                    0                 0                    0
#1  2   76          1                    1                 0                    1
#2  3   99        NaN                  NaN               NaN                  NaN
#3  4   83        NaN                    0                 0                    0
#4  5   80        NaN                  NaN               NaN                  NaN

答案 1 :(得分:1)

IIUC

df1=df.set_index(['Row','ID']).List.apply(pd.Series).stack().apply(pd.Series).reset_index()
df1.pivot_table(index=['Row','ID'], columns='label', values='value',aggfunc=np.sum).merge(df[['Row','ID']],left_index=True,right_on=['Row','ID'],how='right')

Out[334]: 
  Engagement Forum Thread Size Likes and Votes Unique Commenters  Row  ID
0       None                 0               0                 0    1   1
1          1                 1               0                 1    2   2
2        NaN               NaN             NaN               NaN    3   3

数据输入:

df = pd.DataFrame({'Row':[1,2,3],'ID':[1,2,3], 'List':[[{u'value': u'0', u'label': u'Forum Thread Size'}, {u'value': u'0', u'label': u'Unique Commenters'}, {u'value': u'0', u'label': u'Likes and Votes'}], [{u'value': u'1', u'label': u'Forum Thread Size'}, {u'value': u'1', u'label': u'Unique Commenters'}, {u'value': u'1', u'label': u'Engagement'}, {u'value': u'0', u'label': u'Likes and Votes'}],[]]})