Question

假设我有像这样的pandas DataFrame。 C列和E列中的红色值是每列中最高的10个数字。

我如何获得这样的数据框架。它只返回两列中最高10的行？如果该值在最高10但不在两者中，则该行将被忽略。

目前我使用循环执行此操作，其中i分别首先遍历每列，如果值在最高10，那么我保存行索引，然后我循环第三次，其中我排除不在的索引两者，这是非常低效的，因为我使用超过100000行的表。有没有更好的方法呢？

Answer 1

考虑示例数据框df

np.random.seed([3,1415])
rng = np.arange(10)
df = pd.DataFrame(
    dict(
        A=rng,
        B=list('abcdefghij'),
        C=np.random.permutation(rng),
        D=np.random.permutation(rng)
    )
)

print(df)

   A  B  C  D
0  0  a  9  1
1  1  b  4  3
2  2  c  5  5
3  3  d  1  9
4  4  e  7  4
5  5  f  6  6
6  6  g  8  0
7  7  h  3  2
8  8  i  2  7
9  9  j  0  8

使用nlargest识别列表。然后使用query过滤dataframe

n = 5
c_lrgst = df.C.nlargest(n)
d_lrgst = df.D.nlargest(n)

df.query('C in @c_lrgst & D in @d_lrgst')

   A  B  C  D
2  2  c  5  5
5  5  f  6  6

熊猫：根据多列获得最高n行，并且它们相互匹配

1 个答案: