Question

我有两个数据框，df1和df2。

df1：

contig  position   tumor_f  t_ref_count  t_alt_count
1     14599  0.000000            1            0
1     14653  0.400000            3            2
1     14907  0.333333            6            3
1     14930  0.363636            7            4

DF2：

contig  position
1     14599
1     14653

我想从df1中删除带有匹配重叠群的行，df2中的位置值。类似于：df1[df1[['contig','position']].isin(df2[['contig','position']])] 除此之外不起作用。

Answer 1

Version .13正在向DataFrame添加isin方法，以实现此目的。如果你正在使用当前的主人，你可以尝试：

In [46]: df1[['contig', 'position']].isin(df2.to_dict(outtype='list'))
Out[46]: 
  contig position
0   True     True
1   True     True
2   True    False
3   True    False

要获取未包含的元素，请使用~表示not和index

In [45]: df1.ix[~df1[['contig', 'position']].isin(df2.to_dict(outtype='list')).
all(axis=1)]
Out[45]: 
   contig  position   tumor_f  t_ref_count  t_alt_count
2       1     14907  0.333333            6            3
3       1     14930  0.363636            7            4

Answer 2

你可以使用系列isin两次（在0.12中工作）：

In [21]: df1['contig'].isin(df2['contig']) & df1['position'].isin(df2['position'])
Out[21]:
0     True
1     True
2    False
3    False
dtype: bool

In [22]: ~(df1['contig'].isin(df2['contig']) & df1['position'].isin(df2['position']))
Out[22]:
0    False
1    False
2     True
3     True
dtype: bool

In [23]: df1[~(df1['contig'].isin(df2['contig']) & df1['position'].isin(df2['position']))]
Out[23]:
   contig  position   tumor_f  t_ref_count  t_alt_count
2       1     14907  0.333333            6            3
3       1     14930  0.363636            7            4

也许我们可以在0.13中得到一个简洁的解决方案（使用DataFrame＆＃39; s isin就像汤姆的回答一样）。

感觉应该使用内部merge来做一个简洁的方法...

In [31]: pd.merge(df1, df2, how="inner")
Out[31]:
   contig  position  tumor_f  t_ref_count  t_alt_count
0       1     14599      0.0            1            0
1       1     14653      0.4            3            2

Answer 3

这是一个冗长的方法：

iter1 = df1[['contig', 'position']].itertuples()
is_in_other_df = []
for row in iter1:
    tup2 = df2.itertuples()
    is_in_other_df.append(row in tup2)
df1["InOtherDF"] = is_in_other_df

然后只删除“InOtherDF”为True的行。在返回行元组时，您可能需要稍微调整它以忽略索引。

我认为使用merge

这是一种更简洁的方法

df2["FromDF2"] = True
df1 = pandas.merge(df1, df2, left_on=["contig", "position"], 
                   right_on=["contig", "position"], how="left")
df1[~df1.FromDF2]

在pandas中删除具有多个键的行

3 个答案: