我有一个pandas DataFrame df
,如下所示:
0 1
C1 V1
C2 V1
C3 V1
C4 V2
C5 V3
C6 V3
C7 V4
我希望仅按df
列中具有多个值的行来对1
进行分组,所需的输出为:
0 1
C1 V1
C2 V1
C3 V1
C5 V3
C6 V3
我该怎么做?
答案 0 :(得分:1)
我认为您需要boolean indexing
使用DataFrame.duplicated
创建的带keep=False
的掩码,将所有重复标记为True
:
print (df.columns)
Index(['0', '1'], dtype='object')
mask = df.duplicated('1', keep=False)
#another solution with Series.duplicated
#mask = df['1'].duplicated(keep=False)
print (mask)
0 True
1 True
2 True
3 False
4 True
5 True
6 False
dtype: bool
print (df[mask])
0 1
0 C1 V1
1 C2 V1
2 C3 V1
4 C5 V3
5 C6 V3
print (df.columns)
Int64Index([0, 1], dtype='int64')
mask = df.duplicated(1, keep=False)
#another solution with Series.duplicated
#mask = df[1].duplicated(keep=False)
print (mask)
0 True
1 True
2 True
3 False
4 True
5 True
6 False
dtype: bool
print (df[mask])
0 1
0 C1 V1
1 C2 V1
2 C3 V1
4 C5 V3
5 C6 V3