使用散列从数据框中删除列

时间:2016-10-05 10:30:45

标签: python pandas indexing merge multiple-columns

给出两个pandas数据帧:

df1 = pd.read_csv(file1, names=['col1','col2','col3'])
df2 = pd.read_csv(file2, names=['col1','col2','col3'])

我想删除df2中df1中不存在col1col2(或两者)值的所有行。

执行以下操作:

df2 = df2[(df2['col1'] in set(df1['col1'])) & (df2['col2'] in set(df1['col2']))]

的产率:

  

TypeError:'系列'对象是可变的,因此它们不能被散列

1 个答案:

答案 0 :(得分:2)

我认为你可以试试isin

df2 = df2[(df2['col1'].isin(df1['col1'])) & (df2['col2'].isin(df1['col2']))]

df1 = pd.DataFrame({'col1':[1,2,3,3],
                    'col2':[4,5,6,2],
                    'col3':[7,8,9,5]})

print (df1)
   col1  col2  col3
0     1     4     7
1     2     5     8
2     3     6     9
3     3     2     5

df2 = pd.DataFrame({'col1':[1,2,3,5],
                    'col2':[4,7,4,1],
                    'col3':[7,8,9,1]})

print (df2)
   col1  col2  col3
0     1     4     7
1     2     7     8
2     3     4     9
3     5     1     1

df2 = df2[(df2['col1'].isin(df1['col1'])) & (df2['col2'].isin(df1['col2'].unique()))]
print (df2)
   col1  col2  col3
0     1     4     7
2     3     4     9

另一个解决方案是merge,因为内联接(how='inner')是默认设置,但它仅适用于DataFrames中具有相同位置的值:

print (pd.merge(df1, df2))
   col1  col2  col3
0     1     4     7