我有一个数据框df,其中包含按卡进行的交易。一张卡可以进行多次交易,因此可以进行多行交易。我想创建一个新数据框,每张卡一行。问题是每张卡的交易数量会有所不同。我当时以为pd.melt可以解决这个问题。
数据帧看起来像这样:
CardCode Coupon ShopName TranDate
1028670151 S ShopA 2018-05-24 21:02:19.000
1028670151 S ShopB 2018-05-23 13:14:44.000
1028670151 G ShopC 2018-05-24 12:31:24.000
1029282454 S ShopA 2018-05-19 19:52:40.000
1029282454 G ShopB 2018-05-19 14:08:02.000
1029646050 S ShopD 2018-06-17 14:10:42.000
1029684151 S ShopE 2018-05-05 12:33:21.000
1029684151 G ShopB 2018-05-05 15:13:08.000
1029684151 S ShopC 2018-05-06 14:21:02.000
1029754252 G ShopA 2018-05-05 10:40:30.000
我尝试的代码:
df_new = pd.melt(df,
id_vars = ['CardCode '],
value_vars = ['TranDate', 'Coupon', 'ShopName'])
尽管这确实使我朝着目标迈进了一步,但每个CardCode我都没有一行,这是我的最终意图。
所需的输出是这样的:
1028670151 S ShopA 2018-05-24 21:02:19.000 S ShopB 2018-05-23 13:14:44.000 G ShopC 2018-05-24 12:31:24.000
有什么建议吗?
非常感谢!
答案 0 :(得分:0)
好的,您可以使用cumcount
和unstack
:
df_out = df.set_index(['CardCode',df.groupby('CardCode').cumcount() + 1])\
.unstack()\
.sort_index(level=1, axis=1)
df_out.columns = [f'{i}_{j}' for i,j in df_out.columns]
df_out = df_out.reset_index()
df_out
输出:
CardCode Coupon_1 ShopName_1 TranDate_1 Coupon_2 ShopName_2 TranDate_2 Coupon_3 ShopName_3 TranDate_3
0 1028670151 S ShopA 2018-05-24 21:02:19.000 S ShopB 2018-05-23 13:14:44.000 G ShopC 2018-05-24 12:31:24.000
1 1029282454 S ShopA 2018-05-19 19:52:40.000 G ShopB 2018-05-19 14:08:02.000 NaN NaN NaN
2 1029646050 S ShopD 2018-06-17 14:10:42.000 NaN NaN NaN NaN NaN NaN
3 1029684151 S ShopE 2018-05-05 12:33:21.000 G ShopB 2018-05-05 15:13:08.000 S ShopC 2018-05-06 14:21:02.000
4 1029754252 G ShopA 2018-05-05 10:40:30.000 NaN NaN NaN NaN NaN NaN