Question

我有一个看起来像这样的df

data =  [{'Stock': 'Apple', 'Weight': 0.2, 'Price': 101.99, 'Beta': 1.1},
         {'Stock': 'MCSFT', 'Weight': 0.1, 'Price': 143.12, 'Beta': 0.9},
         {'Stock': 'WARNER','Weight': 0.15,'Price': 76.12,  'Beta': -1.1},
         {'Stock': 'ASOS',  'Weight': 0.35,'Price': 76.12,  'Beta': -1.1 },
         {'Stock': 'TESCO', 'Weight': 0.2, 'Price': 76.12,  'Beta': -1.1 }]
data_df = pd.DataFrame(data)

以及将计算加权平均值的自定义函数

def calc_weighted_averages(data_in, weighted_by):
    return sum(x * y for x, y in zip(data_in, weighted_by)) / sum(weighted_by)

我想将这个自定义公式应用到我的df中的所有列，我的第一个想法是做s.th.像这样

data_df = data_df[['Weight','Price','Beta']]
data_df = data_df.apply(lambda x: calc_weighted_averages(x['Price'], x['Weight']), axis=1)

如何修复我的weighted_by列并将自定义函数应用于其他列？我应该得到Price和Beta的加权平均数。

Answer 1

我认为您首先需要所有列的子集，然后使用第二个参数Weight column：

s1 = data_df[['Price','Beta']].apply(lambda x: calc_weighted_averages(x, data_df['Weight']))
print (s1)
Price    87.994
Beta     -0.460
dtype: float64

没有apply的另一种解决方案更快：

s1 = data_df[['Price','Beta']].mul(data_df['Weight'], 0).sum().div(data_df['Weight'].sum())
print (s1)
Price    87.994
Beta     -0.460
dtype: float64

自定义函数使用应用于数据框中每列的多个参数

1 个答案: