Question

我有一个熊猫数据库，我想为其查找{列{1}}列具有相同值并且重复特定时间（我称之为A）的所有行：

size

因此，如果我有A B 0 1 yes 1 2 no 2 3 no 3 2 yes 4 3 no 5 4 yes，则在列size = 2中仅将列值2和3重复2次，因此结果应如下所示：

我已经完成了这段代码，但是由于它使用A B1 B2 0 2 no yes 1 3 yes no循环，对于大数据来说有点慢，所以我正在寻找改进的建议：

for

Answer 1

g = df.groupby('A')
c = g.cumcount() + 1
s = g.A.transform('size').to_numpy()

df.set_index(['A', c]).B[s == 2].unstack().add_prefix('B').reset_index()

   A  B1   B2
0  2  no  yes
1  3  no   no

如果您有更多列

g = df.groupby('A')
c = g.cumcount() + 1
s = g.A.transform('size').to_numpy()


d = df.set_index(['A', c])[s == 2].unstack()
d.columns = [f'{a}{b}' for a, b in d.columns]
d.reset_index()

Answer 2

IIUC，您可以使用groupby().transform：

df[df.groupby('A').B.transform('size').eq(2)]

给予

    A   B
1   2   no
2   3   no
3   2   yes
4   3   no

如何为在数据集中特定时间重复的列查找具有相同特定值的所有行？

2 个答案: