Question

此代码似乎从'A'中删除重复项，但保持'B'不变：

df1.drop_duplicates(['A', 'B'], inplace=True)

编辑：这实际上什么都没有......这里发生了什么？

代码（煮沸）：

import pandas
df1 = pandas.DataFrame({'A':[1,4,0,8,3,4,5,3,3,3,9,9],
                        'B':[5,5,7,4,2,0,0,0,0,0,0,0]})
print(df1)
df1.drop_duplicates(['A', 'B'], inplace=True)
print(df1)

输出：

$ python test.py 
    A  B
0   1  5
1   4  5
2   0  7
3   8  4
4   3  2
5   4  0
6   5  0
7   3  0
8   3  0
9   3  0
10  9  0
11  9  0

[12 rows x 2 columns]
    A  B
0   1  5
1   4  5
2   0  7
3   8  4
4   3  2
5   4  0
6   5  0
7   3  0
10  9  0
[9 rows x 2 columns]

我想我看到上面发生了什么，因为这些星号被删除了：

但我仍然无法看到如何删除'B'中的重复项（或在'B'中返回唯一值）。这两列实际上来自单独的CSV文件。我不应该将它们加入到单个DataFrame中吗？如果不这样做，是否有比较和删除重复的方法？

编辑：这是我正在寻找的输出（删除了星号值，或者要返回的加号标记值）：

    A  B
0   1  5*
1   4* 5*
2   0* 7+
3   8  4*
4   3* 2+
5   4* 0*
6   5* 0*
7   3* 0*
10  9  0*
[9 rows x 2 columns]

Answer 1

这有效：

import pandas

df1 = pandas.DataFrame({'A':[1,4,0,8,3,4,5,3,3,3,9,9],
                        'B':[5,5,7,4,2,0,0,0,0,0,0,0]})
print(df1)
cln = df1.unstack().drop_duplicates()
cln.drop(['A'], inplace=True)
print(cln)
cln = cln.reset_index(drop=True)
print(cln)

输出：

$ python test.py 
    A  B
0   1  5
1   4  5
2   0  7
3   8  4
4   3  2
5   4  0
6   5  0
7   3  0
8   3  0
9   3  0
10  9  0
11  9  0

[12 rows x 2 columns]
B  2    7
   4    2
dtype: int64
0    7
1    2
dtype: int64

python pandas从列'B'中删除值，如果该值出现在列'A'中

1 个答案: