我在Py pandas中有以下DataFrame
source target value type
0 10 1200 0.500 Undirected
1 13 3333 0.600 Undirected
2 10 1200 0.500 Undirected
3 15 2300 0.350 Undirected
4 18 5300 0.250 Undirected
5 17 2300 0.100 Undirected
6 13 3333 0.600 Undirected
答案 0 :(得分:5)
from StringIO import StringIO
import pandas as pd
text=""" source target value type
0 10 1200 0.500 Undirected
1 13 3333 0.600 Undirected
2 10 1200 0.500 Undirected
3 15 2300 0.350 Undirected
4 18 5300 0.250 Undirected
5 17 2300 0.100 Undirected
6 13 3333 0.600 Undirected"""
df = pd.read_csv(StringIO(text), delim_whitespace=True, index_col=[0])
print df[df.duplicated()]
source target value type
2 10 1200 0.5 Undirected
6 13 3333 0.6 Undirected
print df.drop_duplicates(keep=False)
source target value type
3 15 2300 0.35 Undirected
4 18 5300 0.25 Undirected
5 17 2300 0.10 Undirected
df.duplicated()
返回重复内容的布尔掩码
df.drop_duplicates()
删除重复的行
keep=False
指定删除所有已复制的行,而不是保留重复行的第一行或最后一行。 pandas drop duplicates: documentation