Question

我试图从DataFrame B的一列的值中减去DataFrame A的一列的值，但前提是多个列的值彼此相等。

假设（虚构）：

DataFrame A:
Index    Department  Speciality   TargetMonth Capacity
1        Sales       Cars         2019-1      150
2        Sales       Cars         2019-2      120
3        Sales       Furniture    2019-1      110
4        IT          Servers      2019-1      100

DataFrame B:
Index    Department  Speciality   TargetMonth Required
1        Sales       Cars         2019-1      100
2        Sales       Cars         2019-2      120
3        IT          Servers      2019-1      50
4        Sales       Furniture    2019-1      50

我故意将DataFrame B中的索引3和4的顺序替换为A。我的目标是从DataFrame A的Capacity列中减去DataFrame B的Required列，将其作为必需的容量小时数，并生成另一个（不一定要进行排序）列表：

Index    Department  Speciality   TargetMonth Result
1        Sales       Cars         2019-1      50
2        Sales       Cars         2019-2      0
3        Sales       Furniture    2019-1      60
4        IT          Servers      2019-1      50

因此，从技术上讲，仅在所有列值彼此匹配且不基于顺序的情况下仅相减，因为一个列表或另一列表中可能缺少某些行。

我可以用一些for循环和条件来解决这个问题，但是我想有一种干净利落的熊猫方法用.subtract解决这个问题，尽管这是我目前所坚持的“连接”部分。

感谢您的时间。

Answer 1

这就是Index如此有用的原因，减法将在索引（行和列）上对齐。

dfA = dfA.set_index(['Department', 'Speciality', 'TargetMonth'])
dfB = dfB.set_index(['Department', 'Speciality', 'TargetMonth'])

dfA.sub(dfB.rename(columns={'Required': 'Capacity'}), fill_value=0)

                                   Capacity
Department Speciality TargetMonth          
IT         Servers    2019-1             50
Sales      Cars       2019-1             50
                      2019-2              0
           Furniture  2019-1             60

Answer 2

我将使用合并键：

对于此解决方案，将您的数据帧A设为dfA，将数据帧设为dfB

   df_result =  pd.merge(dfA, dfB, how='inner', on=['Department','Speciality','TargetMonth'])

这将基于以下键将数据框放在一起：['部门'，'特殊性'，'TargetMonth']，并生成一个键同时出现在两个数据框中的数据框（how ='inner'）。 / p>

I.E。如果dfB中有一个密钥是

   {'Department': 'IT','Speciality':'Furniture','TargetMonth':2019-1}

此值将不会出现在数据帧df_result中。可在此处找到更多信息-https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

然后使用Pandas向量化解决方案：

   df_result['Result'] = df_result['Capacity'] - df_result['Required']

减去连接到多个列值的两个Pandas DataFrame

2 个答案: