Question

我有一个Pandas数据框，其中列为['week'，'price_per_unit'，'total_units']。我希望创建一个名为“weighted_price”的新列，如下所示：第一组按“周”，然后每周计算该周的price_per_unit * total_units / sum（total_units）。我有代码执行此操作：

import pandas as pd
import numpy as np

def create_features_by_group(df):
    # first group data
    grouped = df.groupby(['week'])
    df_temp = pd.DataFrame(columns=['weighted_price'])

    # run through the groups and create the weighted_price per group
    for name, group in grouped:
        res = (group['total_units'] * group['price_per_unit']) / np.sum(group['total_units'])
        for idx in res.index:
            df_temp.loc[idx] = [res[idx]]

    df.join(df_temp['weighted_price'])

    return df

唯一的问题是这非常非常慢。有没有更快的方法来做到这一点？

我使用以下代码来测试该函数。

import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['week', 'price_per_unit', 'total_units'])


for i in range(10):
    df.loc[i] = [round(int(i % 3), 0) , 10 * np.random.rand(), round(10 * np.random.rand(), 0)]

Answer 1

我认为你需要这样做：

src/

Answer 2

我已将数据集按“周”分组，以计算每周的加权价格。

然后我使用分组数据集加入原始数据集以获得结果：

# importing the libraries
import pandas as pd
import numpy as np

# creating the dataset
df = {
'Week' : [1,1,1,1,2,2], 
      'price_per_unit' : [10,11,22,12,12,45],
      'total_units' : [10,10,10,10,10,10]
      }
df = pd.DataFrame(df)
df['price'] = df['price_per_unit'] * df['total_units']

# calculate the total sales and total number of units sold in each week
df_grouped_week = df.groupby(by = 'Week').agg({'price' : 'sum', 'total_units' : 'sum'}).reset_index()

# calculate the weighted price
df_grouped_week['wt_price'] = df_grouped_week['price']  / df_grouped_week['total_units']  

# merging df and df_grouped_week
df_final = pd.merge(df, df_grouped_week[['Week', 'wt_price']], how = 'left', on = 'Week')

在Pandas数据框中为分组数据创建新功能列

2 个答案: