如何根据总数百分比拆分字段值

时间:2019-10-02 08:05:59

标签: python pandas

我的交易总额按date_monthdevicechannel分组,

date_month   device            channel  transactions
2017-01-01  desktop         AFFILIATES           413
2017-01-01   mobile         AFFILIATES           501
2017-01-01    other         AFFILIATES            22
2017-01-01   tablet         AFFILIATES           250
2017-01-01  desktop             DIRECT         13979
etc...       etc...             etc...        etc...

date_month的范围是从2017-01-01到当前日期

我要做的是将device的{​​{1}}字段拆分为othermobiledesktop

示例过程:

  • 将值tablet作为附加列('other')的枢轴设备transactions
  • other_transactionstransactionsdate_monthchannel)分区/分组的总和
  • 然后将total_transactions除以transactions以得到总计百分比(total_transactions
  • percent_totalother_transactions相乘得到percent_total
  • other_split添加到other_split以获取更新的交易字段

获取总数并应用简单的数学运算应该不是问题。我会按照transactions的方式进行操作以获得df['total_transactions']=df.groupby(['date_month', 'channel'])['transactions'].transform('sum'),但是我遇到的问题是将total_transactions交易放入这样的单独列中

other

最后,我希望有一个数据框,该数据框将从date_month device channel transactions other_trans 2017-01-01 desktop AFFILIATES 413 22 2017-01-01 mobile AFFILIATES 501 22 2017-01-01 tablet AFFILIATES 250 22 2017-01-01 desktop DIRECT 13979 etc etc... etc... etc... etc... 列中删除other个设备,并使用其交易量根据该{ {1}}和device

1 个答案:

答案 0 :(得分:1)

IIUC,您可以首先使用groupby创建另一个数据框,使用others删除行,然后执行merge

import pandas as pd

df = pd.DataFrame({'date_month': {0: '2017-01-01', 1: '2017-01-01', 2: '2017-01-01', 3: '2017-01-01', 4: '2017-01-01', 5:"2017-01-01"},
                   'device': {0: 'desktop', 1: 'mobile', 2: 'other', 3: 'tablet', 4: 'desktop', 5:"other"},
                   'channel': {0: 'AFFILIATES', 1: 'AFFILIATES', 2: 'AFFILIATES', 3: 'AFFILIATES', 4: 'DIRECT', 5: 'DIRECT'},
                   'transactions': {0: 413, 1: 501, 2: 22, 3: 250, 4: 13979, 5: 234}})

other = df.groupby("device").get_group("other")[["date_month","channel","transactions"]]

df = df.drop(df[df["device"].str.contains("other")].index)

df = df.merge(other, on=["date_month","channel"], how="left", suffixes=("","_other"))

print (df)

结果:

   date_month   device     channel  transactions  transactions_other
0  2017-01-01  desktop  AFFILIATES           413                  22
1  2017-01-01   mobile  AFFILIATES           501                  22
2  2017-01-01   tablet  AFFILIATES           250                  22
3  2017-01-01  desktop      DIRECT         13979                 234
相关问题