我有以下数据框:
network date count2 count3 user2 user3
3 20170721 [6, 7] [1,3] [57,88] [47,58]
4 20170721 [6] [] [12] []
43 20170721 [] [7,2] [] [57,62]
我希望每行拆分列表但计数和用户必须对应:
network date count2 count3 user2 user3
3 20170727 6 Nan 57 Nan
3 20170727 7 Nan 88 Nan
3 20170727 Nan 1 Nan 47
3 20170727 Nan 3 Nan 58
4 20170727 6 Nan 12 Nan
43 20170727 Nan 7 Nan 57
43 20170727 Nan 2 Nan 62
我怎样才能快速完成?用户列表实际上很长(超过50k条目)。 谢谢!
答案 0 :(得分:1)
你可以做到这一点,并在没有额外NaN的情况下实现你正在寻找的结果。
df = pd.DataFrame({'network':[3,4,43],'date':['20170721']*3,
'count2':[[6,7],[6],[]],
'count3':[[1,3],[],[7,2]],
'user2':[[57,88],[12],[]],
'user3':[[47,58],[],[57,62]]})
df = df.set_index(['network','date'])
(df.apply(lambda x: pd.DataFrame(x.tolist(),index=x.index)
.stack()
.rename(x.name))
.reset_index())
输出:
network date level_2 level_0 count2 count3 user2 user3
0 3 20170721 0 0.0 6.0 1.0 57.0 47.0
1 3 20170721 1 NaN 7.0 3.0 88.0 58.0
2 4 20170721 0 1.0 6.0 NaN 12.0 NaN
3 43 20170721 0 2.0 NaN 7.0 NaN 57.0
4 43 20170721 1 NaN NaN 2.0 NaN 62.0