我的交易总额按date_month
,device
和channel
分组,
date_month device channel transactions
2017-01-01 desktop AFFILIATES 413
2017-01-01 mobile AFFILIATES 501
2017-01-01 other AFFILIATES 22
2017-01-01 tablet AFFILIATES 250
2017-01-01 desktop DIRECT 13979
etc... etc... etc... etc...
date_month的范围是从2017-01-01
到当前日期
我要做的是将device
的{{1}}字段拆分为other
,mobile
或desktop
示例过程:
tablet
作为附加列('other'
)的枢轴设备transactions
other_transactions
和transactions
和date_month
(channel
)分区/分组的总和total_transactions
除以transactions
以得到总计百分比(total_transactions
)percent_total
和other_transactions
相乘得到percent_total
other_split
添加到other_split
以获取更新的交易字段获取总数并应用简单的数学运算应该不是问题。我会按照transactions
的方式进行操作以获得df['total_transactions']=df.groupby(['date_month', 'channel'])['transactions'].transform('sum')
,但是我遇到的问题是将total_transactions
交易放入这样的单独列中
other
最后,我希望有一个数据框,该数据框将从date_month device channel transactions other_trans
2017-01-01 desktop AFFILIATES 413 22
2017-01-01 mobile AFFILIATES 501 22
2017-01-01 tablet AFFILIATES 250 22
2017-01-01 desktop DIRECT 13979 etc
etc... etc... etc... etc...
列中删除other
个设备,并使用其交易量根据该{ {1}}和device
答案 0 :(得分:1)
IIUC,您可以首先使用groupby
创建另一个数据框,使用others
删除行,然后执行merge
:
import pandas as pd
df = pd.DataFrame({'date_month': {0: '2017-01-01', 1: '2017-01-01', 2: '2017-01-01', 3: '2017-01-01', 4: '2017-01-01', 5:"2017-01-01"},
'device': {0: 'desktop', 1: 'mobile', 2: 'other', 3: 'tablet', 4: 'desktop', 5:"other"},
'channel': {0: 'AFFILIATES', 1: 'AFFILIATES', 2: 'AFFILIATES', 3: 'AFFILIATES', 4: 'DIRECT', 5: 'DIRECT'},
'transactions': {0: 413, 1: 501, 2: 22, 3: 250, 4: 13979, 5: 234}})
other = df.groupby("device").get_group("other")[["date_month","channel","transactions"]]
df = df.drop(df[df["device"].str.contains("other")].index)
df = df.merge(other, on=["date_month","channel"], how="left", suffixes=("","_other"))
print (df)
结果:
date_month device channel transactions transactions_other
0 2017-01-01 desktop AFFILIATES 413 22
1 2017-01-01 mobile AFFILIATES 501 22
2 2017-01-01 tablet AFFILIATES 250 22
3 2017-01-01 desktop DIRECT 13979 234