我有一个DataFrame
,我想只保留同一行中两个第一列中具有相同值的行。
County_in = pd.Series(["001","001","002"], dtype="category")
County_out = pd.Series(["001","003","001"], dtype="category")
Value = pd.Series([2,4,6], dtype="int")
foo = pd.DataFrame({'County_in' : County_in,
'County_out' : County_out,
'Value' : Value})
foo
County_in County_out value
0 001 001 2
1 001 003 4
2 002 001 6
我想得到这样的结果:
County_in County_out value
1 001 003 4
2 002 001 6
我试过了:
foo_2 = foo[~foo.County_out.isin(foo.County_in)]
但即使该值在同一行中不相同,它也会删除行:
foo
County_in County_out value
1 001 003 4
我有可以使用的功能吗?
答案 0 :(得分:2)
IIUC你只想要这个:
In [80]:
foo[foo['County_in'] != foo['County_out']]
Out[80]:
County_in County_out value
1 001 003 4
2 002 001 6
修改强>
当类别不同时,您无法比较分类,但是,如果您将值转换为str
,那么它可以正常工作:
In [99]:
foo[foo['County_in'] != foo['County_out'].astype(str)]
Out[99]:
County_in County_out Value
1 001 003 4
2 002 001 6
请参阅文档:http://pandas.pydata.org/pandas-docs/stable/categorical.html#comparisons