通过不同范围的因素对多列进行加权

时间:2018-12-04 09:10:40

标签: python pandas loops

我有一个3列的数据框df。 A,B和C。我想创建一个加权平均列,但要测试不同的权重(权重必须等于100%)。

所以我可以做到;

weights  =np.arange(0,1,0.05)
if i+j+k=1:

for i in weights:
     for j in weights:
         for k in weights:
outname=str(i)+'A'+str(j)+'B'+str(k)+'C'

df[outname]=df['A'].multiply(k)+df['B'].multiply(i)+df['C'].multiply(j)
else:
    pass

但是,列数可能会更改为更大的数。因此,该方法将停止工作。

有人能看到一个聪明的方法吗?

1 个答案:

答案 0 :(得分:1)

这就是您要寻找的东西

from random import randint
import pandas as pd

df = pd.DataFrame([[0,1,2],[3,4,5],[6,7,8]], columns=['A','B','C'])
weightpool = np.arange(0,1,0.05)
weights =  np.linspace(0, 0, num=df.columns.size)


for times in range(1,3):
    #all weights sum up to 1
    while weights.sum()!=1:
        #choose weights out of pool
        for i in range(len(weights)-1):
            weights[i] = weightpool[randint(0, len(weightpool)-1)]

    for i in range(len(weights)-1):
        outname =  outname + str(weights[i]) + df.columns[i]
        outvalue = df[df.columns[i]].multiply(weights[i])
        df[outname] = pd.Series(outvalue, index=df.index)

df