Question

我有一个pandas DataFrame df，如下所示：

我希望仅按df列中具有多个值的行来对1进行分组，所需的输出为：

我该怎么做？

Answer 1

我认为您需要boolean indexing使用DataFrame.duplicated创建的带keep=False的掩码，将所有重复标记为True：

print (df.columns)
Index(['0', '1'], dtype='object')

mask = df.duplicated('1', keep=False)
#another solution with Series.duplicated
#mask = df['1'].duplicated(keep=False)

print (mask)
0     True
1     True
2     True
3    False
4     True
5     True
6    False
dtype: bool

print (df[mask])
    0   1
0  C1  V1
1  C2  V1
2  C3  V1
4  C5  V3
5  C6  V3

print (df.columns)
Int64Index([0, 1], dtype='int64')

mask = df.duplicated(1, keep=False)
#another solution with Series.duplicated
#mask = df[1].duplicated(keep=False)

print (mask)
0     True
1     True
2     True
3    False
4     True
5     True
6    False
dtype: bool

print (df[mask])
    0   1
0  C1  V1
1  C2  V1
2  C3  V1
4  C5  V3
5  C6  V3

如何仅通过具有多个条目的列对DataFrame进行子集化？

1 个答案: