在Pandas数据框中为分组数据创建新功能列

时间:2018-06-14 09:31:28

标签: python pandas

我有一个Pandas数据框,其中列为['week','price_per_unit','total_units']。我希望创建一个名为“weighted_price”的新列,如下所示:第一组按“周”,然后每周计算该周的price_per_unit * total_units / sum(total_units)。我有代码执行此操作:

import pandas as pd
import numpy as np

def create_features_by_group(df):
    # first group data
    grouped = df.groupby(['week'])
    df_temp = pd.DataFrame(columns=['weighted_price'])

    # run through the groups and create the weighted_price per group
    for name, group in grouped:
        res = (group['total_units'] * group['price_per_unit']) / np.sum(group['total_units'])
        for idx in res.index:
            df_temp.loc[idx] = [res[idx]]

    df.join(df_temp['weighted_price'])

    return df 

唯一的问题是这非常非常慢。有没有更快的方法来做到这一点?

我使用以下代码来测试该函数。

import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['week', 'price_per_unit', 'total_units'])


for i in range(10):
    df.loc[i] = [round(int(i % 3), 0) , 10 * np.random.rand(), round(10 * np.random.rand(), 0)]

2 个答案:

答案 0 :(得分:0)

我认为你需要这样做:

src/

答案 1 :(得分:0)

我已将数据集按“周”分组,以计算每周的加权价格。

然后我使用分组数据集加入原始数据集以获得结果:

# importing the libraries
import pandas as pd
import numpy as np

# creating the dataset
df = {
'Week' : [1,1,1,1,2,2], 
      'price_per_unit' : [10,11,22,12,12,45],
      'total_units' : [10,10,10,10,10,10]
      }
df = pd.DataFrame(df)
df['price'] = df['price_per_unit'] * df['total_units']

# calculate the total sales and total number of units sold in each week
df_grouped_week = df.groupby(by = 'Week').agg({'price' : 'sum', 'total_units' : 'sum'}).reset_index()

# calculate the weighted price
df_grouped_week['wt_price'] = df_grouped_week['price']  / df_grouped_week['total_units']  

# merging df and df_grouped_week
df_final = pd.merge(df, df_grouped_week[['Week', 'wt_price']], how = 'left', on = 'Week')