如何仅通过具有多个条目的列对DataFrame进行子集化?

时间:2017-01-23 13:18:22

标签: python-3.x pandas dataframe

我有一个pandas DataFrame df,如下所示:

0     1
C1    V1
C2    V1
C3    V1
C4    V2
C5    V3
C6    V3
C7    V4

我希望仅按df列中具有多个值的行来对1进行分组,所需的输出为:

0     1
C1    V1
C2    V1
C3    V1
C5    V3
C6    V3

我该怎么做?

1 个答案:

答案 0 :(得分:1)

我认为您需要boolean indexing使用DataFrame.duplicated创建的带keep=False的掩码,将所有重复标记为True

print (df.columns)
Index(['0', '1'], dtype='object')

mask = df.duplicated('1', keep=False)
#another solution with Series.duplicated
#mask = df['1'].duplicated(keep=False)

print (mask)
0     True
1     True
2     True
3    False
4     True
5     True
6    False
dtype: bool

print (df[mask])
    0   1
0  C1  V1
1  C2  V1
2  C3  V1
4  C5  V3
5  C6  V3
print (df.columns)
Int64Index([0, 1], dtype='int64')

mask = df.duplicated(1, keep=False)
#another solution with Series.duplicated
#mask = df[1].duplicated(keep=False)

print (mask)
0     True
1     True
2     True
3    False
4     True
5     True
6    False
dtype: bool

print (df[mask])
    0   1
0  C1  V1
1  C2  V1
2  C3  V1
4  C5  V3
5  C6  V3