Question

我有一个DataFrame，我想只保留同一行中两个第一列中具有相同值的行。

County_in = pd.Series(["001","001","002"], dtype="category")
County_out = pd.Series(["001","003","001"], dtype="category")
Value = pd.Series([2,4,6], dtype="int")

foo = pd.DataFrame({'County_in' : County_in,
                    'County_out' : County_out,
                    'Value' : Value})

foo

   County_in  County_out    value
0  001        001           2
1  001        003           4
2  002        001           6

我想得到这样的结果：

   County_in  County_out    value
1  001        003           4
2  002        001           6

我试过了：

 foo_2 = foo[~foo.County_out.isin(foo.County_in)]

但即使该值在同一行中不相同，它也会删除行：

foo

   County_in  County_out    value
1  001        003           4

我有可以使用的功能吗？

Answer 1

IIUC你只想要这个：

In [80]:
foo[foo['County_in'] != foo['County_out']]

Out[80]:
  County_in County_out  value
1       001        003      4
2       002        001      6

修改

当类别不同时，您无法比较分类，但是，如果您将值转换为str，那么它可以正常工作：

In [99]: foo[foo['County_in'] != foo['County_out'].astype(str)] Out[99]: County_in County_out Value 1 001 003 4 2 002 001 6

请参阅文档：http://pandas.pydata.org/pandas-docs/stable/categorical.html#comparisons

根据不同列中的重复分类值有条件地选择DataFrame行

1 个答案: