Question

我想要做的操作类似于合并。例如，通过inner合并，我们得到一个数据帧，其中包含第一个和第二个数据帧中存在的行。通过outer合并，我们得到一个数据帧，它在第二个数据帧的第一个OR中出现。

我需要的是一个数据框，其中包含第一个数据框中不存在的行而第二个数据框中不存在的行？是否有快速而优雅的方式来做到这一点？

Answer 1

如下所示：

print df1

    Team  Year  foo
0   Hawks  2001    5
1   Hawks  2004    4
2    Nets  1987    3
3    Nets  1988    6
4    Nets  2001    8
5    Nets  2000   10
6    Heat  2004    6
7  Pacers  2003   12

print df2

    Team  Year  foo
0  Pacers  2003   12
1    Heat  2004    6
2    Nets  1988    6

只要有一个非常用的通常命名列，你可以让添加的on sufffex工作（如果没有非键的公共列，那么你可以创建一个暂时使用... {{1 }和df1['common'] = 1）：

df2['common'] = 1

或者您可以使用new = df1.merge(df2,on=['Team','Year'],how='left') print new[new.foo_y.isnull()] Team Year foo_x foo_y 0 Hawks 2001 5 NaN 1 Hawks 2004 4 NaN 2 Nets 1987 3 NaN 4 Nets 2001 8 NaN 5 Nets 2000 10 NaN，但您必须创建一个密钥：

isin

Answer 2

考虑以下事项：

df_one是第一个DataFrame
df_two是第二个DataFrame

出现在第一个DataFrame 和不在第二个DataFrame中

解决方案：通过索引 df = df_one[~df_one.index.isin(df_two.index)]

索引可以替换为您希望排除的列。在上面的例子中，我使用了索引作为两个数据框之间的参考

此外，您还可以使用布尔值pandas.Series来使用更复杂的查询来解决上述问题。

Answer 3

如果您的非索引列包含具有NaN的单元格，则可能会遇到错误。

print df1

    Team   Year  foo
0   Hawks  2001    5
1   Hawks  2004    4
2    Nets  1987    3
3    Nets  1988    6
4    Nets  2001    8
5    Nets  2000   10
6    Heat  2004    6
7  Pacers  2003   12
8 Problem  2112  NaN


print df2

     Team  Year  foo
0  Pacers  2003   12
1    Heat  2004    6
2    Nets  1988    6
3 Problem  2112  NaN

new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.foo_y.isnull()]

     Team  Year  foo_x  foo_y
0   Hawks  2001      5    NaN
1   Hawks  2004      4    NaN
2    Nets  1987      3    NaN
4    Nets  2001      8    NaN
5    Nets  2000     10    NaN
6 Problem  2112    NaN    NaN

2112中的问题团队在任何一个表中对foo都没有价值。因此，此处的左连接将错误地返回在两个DataFrame中匹配的行，因为右侧DataFrame中不存在该行。

<强>解决方案：

我所做的是为内部DataFrame添加一个唯一列，并为所有行设置一个值。然后，当您加入时，您可以检查该列是否为内部表的NaN，以便在外部表中查找唯一记录。

df2['in_df2']='yes'

print df2

     Team  Year  foo  in_df2
0  Pacers  2003   12     yes
1    Heat  2004    6     yes
2    Nets  1988    6     yes
3 Problem  2112  NaN     yes


new = df1.merge(df2,on=['Team','Year'],how='left')
print new[new.in_df2.isnull()]

     Team  Year  foo_x  foo_y  in_df1  in_df2
0   Hawks  2001      5    NaN     yes     NaN
1   Hawks  2004      4    NaN     yes     NaN
2    Nets  1987      3    NaN     yes     NaN
4    Nets  2001      8    NaN     yes     NaN
5    Nets  2000     10    NaN     yes     NaN

NB。问题行现在已正确过滤掉，因为它具有in_df2的值。

  Problem  2112    NaN    NaN     yes     yes

Answer 4

我建议在合并中使用参数“指标”。同样，如果“ on”为“无”，则默认为两个DataFrame中列的交点。

new = df1.merge(df2,how='left', indicator=True) # adds a new column '_merge'
new = new[(new['_merge']=='left_only')].copy() #rows only in df1 and not df2
new = new.drop(columns='_merge').copy()

    Team    Year    foo
0   Hawks   2001    5
1   Hawks   2004    4
2   Nets    1987    3
4   Nets    2001    8
5   Nets    2000    10

参考：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

indicator : boolean or string, default False

If True, adds a column to output DataFrame called “_merge” with information on the source of each row. 
Information column is Categorical-type and takes on a value of 
“left_only” for observations whose merge key only appears in ‘left’ DataFrame,
“right_only” for observations whose merge key only appears in ‘right’ DataFrame, 
and “both” if the observation’s merge key is found in both.

如何从另一个pandas数据框中减去一行？

4 个答案: