Question

我有一个熊猫数据框，如下所示：

fastmoving[['dist','unique','id']]
Out[683]: 
        dist  unique          id
1   0.406677     4.0  4.997434e+09
2   0.406677     4.0  4.452593e+09
5   0.406677     4.0  4.188395e+09
1   0.434386     4.0  8.288070e+09
4   0.434386     4.0  3.274609e+09

我想要实现的是：

查找前n个最长距离的条目。列“ dist”
在前n个条目中查找具有最大百分比m的ID。列“ id”。

到目前为止，我已经能够编写出最多条目的代码。

#Get the first id with the largest dist:
fastmoving.loc[fastmoving['dist'].idxmax(),'id']

#Get all id's with the largest dist:
fastmoving.loc[fastmoving['dist']==fastmoving['dist'].max(),'id']

我想念的是我的代码要为多个值工作。

因此，要使用最大值范围（前n 个值）而不是最大值。
，然后获取所有最大值中的 m 百分比内的所有ID。

请问您如何在熊猫中实现这一目标？

非常感谢亚历克斯

Answer 1

IIUC，您可以利用nlargest。以下示例将采用dist的前 3 个值，然后从中提取id的前 2 个值：

fastmoving.nlargest(3, ["dist", "id"]).nlargest(2, "id")
       dist  unique            id
1  0.434386     4.0  8.288070e+09
1  0.406677     4.0  4.997434e+09

Answer 2

您可以将nlargest用于前n ，将quantile用于前m％，

import pandas as pd
from io import StringIO

fastmoving = pd.read_csv(StringIO("""
        dist  unique          id
1   0.406677     4.0  4.997434e+09
2   0.406677     4.0  4.452593e+09
5   0.406677     4.0  4.188395e+09
1   0.434386     4.0  8.288070e+09
4   0.434386     4.0  3.274609e+09"""), sep="\s+")

n = 3
m = 50

top_n_dist = fastmoving.nlargest(n, ["dist"])
top_m_precent_id_in_top_n_dist = top_n_dist[top_n_dist['id']>top_n_dist['id'].quantile(m/100)]

print(top_m_precent_id_in_top_n_dist)

python pandas：找到前n个，然后在前n个中找到m

2 个答案: