Question

我正在尝试以有效的方式计算数据可能不匹配的两组之间的差异。

以下数据框df

df = pd.DataFrame({'type': ['A', 'A', 'A', 'W', 'W', 'W'],
                   'code': ['1', '2', '3', '1', '2', '4'],
                   'values': [50, 25, 25, 50, 10, 40]})

有两种类型的“代码”不匹配-特别是对于“ W”类型，不存在代码3，对于“ A”类型，不存在代码4。我将代码包装为字符串，因为在我的特殊情况下，有时它们是字符串。

我想减去两种类型之间匹配代码的值，以便我们获得

result = pd.DataFrame({'code': ['1', '2', '3', '4'],
                       'diff': [0, 15, 25, -40]})

符号将指示哪种类型具有更大的价值。

我花了一些时间在这里研究groupby diff方法的变体，但是没有看到任何处理两个潜在不匹配列之间相减的特殊问题。相反，大多数问题似乎都适合diff（）方法的预期用途。

我最近尝试过的方法是在df.groupby['type']上使用列表理解将其分为两个数据帧，但是在删除不匹配的案例时，我仍然遇到类似的问题。

Answer 1

按代码分组，然后将缺失的值替换为0

df = pd.DataFrame({'type': ['A', 'A', 'A', 'W', 'W', 'W'],
                   'code': ['1', '2', '3', '1', '2', '4'],
                   'values': [50, 25, 25, 50, 10, 40]})

def my_func(x):
    # What if there are more than 1 value for a type/code combo?
    a_value = x[x.type == 'A']['values'].max() 
    w_value = x[x.type == 'W']['values'].max()

    a_value = 0 if np.isnan(a_value) else a_value
    w_value = 0 if np.isnan(w_value) else w_value
    return a_value - w_value

df_new = df.groupby('code').apply(my_func)

df_new = df_new.reset_index()
df_new = df_new.rename(columns={0:'diff'})

print(df_new)

  code  diff
0    1     0
1    2    15
2    3    25
3    4   -40

在数据框内的组之间减去值

1 个答案: