import numpy as np
import pandas as pd
columns = ['id', 'A', 'B', 'C']
index = np.arange(3)
df = pd.DataFrame(np.random.randn(3,4), columns=columns, index=index)
weights = {'A': 0.10, 'B': 1.00, 'C': 1.50}
我需要使用相应的权重(不包括第一列)将每个“单元格”中的值复用。例如:
df.at[0,'A'] * weights['A']
df.at[0,'B'] * weights['B']
最有效的方法是什么?将结果放在新的DataFrame中?
答案 0 :(得分:5)
<强>设置强>
df
Out[1013]:
id A B C
0 -0.641314 -0.526509 0.225116 -1.131141
1 0.018321 -0.944734 -0.123334 -0.853356
2 0.703119 0.468857 1.038572 -1.529723
weights
Out[1026]: {'A': 0.1, 'B': 1.0, 'C': 1.5}
W = np.asarray([weights[e] for e in sorted(weights.keys())])
<强>解决方案强>
#use a matrix multiplication to apply the weights to each column
df.loc[:,['A','B','C']] *= W
df
Out[1016]:
id A B C
0 -0.641314 -0.052651 0.225116 -1.696712
1 0.018321 -0.094473 -0.123334 -1.280034
2 0.703119 0.046886 1.038572 -2.294584
<强>更新强>
如果您需要保持列名灵活,我认为更好的方法是将列名和权重保存在2个列表中:
columns = sorted(weights.keys())
Out[1072]: ['A', 'B', 'C']
weights = [weights[e] for e in columns]
Out[1074]: [0.1, 1.0, 1.5]
然后你可以这样做:
df.loc[:,columns] *=weights
Out[1067]:
id A B C
0 -0.641314 -0.052651 0.225116 -1.696712
1 0.018321 -0.094473 -0.123334 -1.280034
2 0.703119 0.046886 1.038572 -2.294584
oneliner解决方案:
df.loc[:,sorted(weights.keys())] *=[weights[e] for e in sorted(weights.keys())]
df
Out[1089]:
id A B C
0 -0.641314 -0.052651 0.225116 -1.696712
1 0.018321 -0.094473 -0.123334 -1.280034
2 0.703119 0.046886 1.038572 -2.294584
答案 1 :(得分:3)
我认为最简单的是从Series
创建dict
,它可以将索引与列名对齐:
print (df)
id A B C
0 -0.641314 -0.526509 0.225116 -1.131141
1 0.018321 -0.944734 -0.123334 -0.853356
2 0.703119 0.468857 1.038572 -1.529723
print (pd.Series(weights))
A 0.1
B 1.0
C 1.5
dtype: float64
df[['A','B','C']] *= pd.Series(weights)
print (df)
id A B C
0 -0.641314 -0.052651 0.225116 -1.696711
1 0.018321 -0.094473 -0.123334 -1.280034
2 0.703119 0.046886 1.038572 -2.294585
更通用的解决方案,感谢piRSquared和juanpa.arrivillaga:
df[list(weights)] *= pd.Series(weights)
print (df)
id A B C
0 -0.641314 -0.052651 0.225116 -1.696711
1 0.018321 -0.094473 -0.123334 -1.280034
2 0.703119 0.046886 1.038572 -2.294585
答案 2 :(得分:2)
这是一种简洁的方式,如果它让你感兴趣:
In [11]: df.assign(**{"{}_product".format(cl): val*df.loc[:,cl]
...: for cl, val in weights.items()})
Out[11]:
id A B C A_product B_product C_product
0 -1.893885 0.940408 0.841350 -0.669378 0.094041 0.841350 -1.004067
1 -0.526427 0.472322 -0.546121 0.201615 0.047232 -0.546121 0.302423
2 -0.450193 -0.422066 0.564866 1.866878 -0.042207 0.564866 2.800318
或者,如果你想替换数据:
In [13]: df.assign(**{cl: val*df.loc[:,cl]
...: for cl, val in weights.items()})
Out[13]:
id A B C
0 -1.893885 0.094041 0.841350 -1.004067
1 -0.526427 0.047232 -0.546121 0.302423
2 -0.450193 -0.042207 0.564866 2.800318
这会产生新的数据框,并且无法就地工作。
答案 3 :(得分:2)
这适用于数据框和字典
中的非重叠键np.random.seed([3,1415])
df = pd.DataFrame(
np.random.randn(3,4),
columns='id A B C D'.split()
)
weights = dict(A=.1, B=1., C=1.5, D=2.)
df
id A B C
0 -2.129724 -1.268466 -1.970500 -2.259055
1 -0.349286 -0.026955 0.316236 0.348782
2 0.715364 0.770763 -0.608208 0.352390
注意: df
有id
,其中weights
没有。 weights
有D
,其中df
没有。此解决方案仅修改重叠的列。而且,它非常简洁。
df.update(df.mul(pd.Series(weights)).dropna(1))
df
id A B C
0 -2.129724 -0.126847 -1.970500 -3.388583
1 -0.349286 -0.002696 0.316236 0.523173
2 0.715364 0.077076 -0.608208 0.528586