Question

我在DataFrame id中有一些带有val和df列的数据，在DataFrame scaling中有一些缩放因子，这样它对于其中的每个值都有一个缩放因子id

df = pd.DataFrame(data=dict(id=['a', 'a', 'a', 'b', 'b', 'c'], val=[1, 2, 3, 10, 11, 100]))
scaling = pd.DataFrame(data=dict(id=['a', 'b', 'c'], scaling=[1, 0.1, 0.01]))

这些看起来像这样：

In[23]: df
Out[24]: 
  id  val
0  a    1
1  a    2
2  a    3
3  b   10
4  b   11
5  c  100
scaling
Out[25]: 
  id  scaling
0  a     1.00
1  b     0.10
2  c     0.01

我现在想将df中的数据乘以缩放因子。我可以这样做，但是感觉很尴尬，效率可能低吗？

df['val'] = df['val'] * df.merge(scaling, left_on='id', right_on='id')['scaling']

有没有更好的方法来应用这些因素？

Answer 1

您可以将map与set_index上的scaling一起使用：

df['val'] * df['id'].map(scaling.set_index('id').scaling)

输出：

0    1.0
1    2.0
2    3.0
3    1.0
4    1.1
5    1.0
dtype: float64

Answer 2

您可以通过按比例缩放dtaframe中的列创建字典来进行映射。

>>> df = pd.DataFrame(data=dict(id=['a', 'a', 'a', 'b', 'b', 'c'], val=[1, 2, 3, 10, 11, 100]))

>>> scaling = pd.DataFrame(data=dict(id=['a', 'b', 'c'], scaling=[1, 0.1, 0.01]))

>>> scaling_dict = dict(zip(scaling['id'], scaling['scaling']))

>>> df['multiplier'] = df['id'].map(scaling_dict)

>>> df['val'] = df['val']*df['multiplier']

>>> df
  id  val  multiplier
0  a  1.0        1.00
1  a  2.0        1.00
2  a  3.0        1.00
3  b  1.0        0.10
4  b  1.1        0.10
5  c  1.0        0.01

熊猫将值乘以另一个DataFrame中的比例因子

2 个答案: