Question

如何获得两个DataFrame之间的差异。例如，我有2个DataFrames

def no_repeats(s):
    new_list = []
    for number in s:
        if new_list.count(number) < 1:
            new_list.append(number)
    return(new_list)

我想收到

previous_asks =  pd.DataFrame({'price':[1,2,3], 'amount':[10,20,30]})
current_asks = pd.DataFrame({'price':[1,2,3,4], 'amount':[11,20,30,40]})

Answer 1

使用pandas：

a1 = list(range(10))
a2 = list(range(5, 8))

b1 = list('abcdefghij')
b2 = list('efy')

df1 =  pd.DataFrame({'price':a1, 'amount':b1})
df2 = pd.DataFrame({'price':a2, 'amount':b2})

dict_results = dict()
for col in df1:
    dict_results[col] = df1.loc[~ df1[col].isin(df2[col].values), col].values
    print('--', col, dict_results[col])

给出：

-- amount ['a' 'b' 'c' 'd' 'g' 'h' 'i' 'j']
-- price [0 1 2 3 4 8 9]

使用python3：

set1 = set(a1)
set2 = set(a2)
print(set1 - set2)

给出：

{0, 1, 2, 3, 4, 8, 9}

我宁愿在这里使用python3，因为我认为它更简单/可读。如果你原始拥有pandas数据帧，我会将它们转换为set数据键入，操纵它们并在必要时恢复为pd.Dataframe。

还值得一试 unique() pd.Series的方法。

如何获得两个DataFrame之间的差异

1 个答案: