Question

我创建了一个由Country，deal_category和some_metric组成的数据框。

看起来像是

    Country     metric_count    channel
0   Country1    123472          c1
1   Country1    159392          c2
2   Country2    14599           c3
3   Country2    17382           c4

我使用命令

根据国家和频道编制索引

df2 = df.set_index(["Country", "channel"])

这会创建以下数据帧。

            metric_count
Country     channel     
Country1    category1   12347
            category2   159392
            category3   14599
            category4   17382

Country2    category1   1234

这就是我想要做的。我想保持这个结构相同，并根据度量计数排序。换句话说，我想根据指标计数为每个国家/地区显示前3个频道。

例如，我想要为每个国家/地区显示一个数据框，按降序metric_counts排序前3个类别。

Country2    top category1   12355555
            top category2   159393
            top category3   16759

我先尝试排序，然后编制索引，但结果数据框架不再根据国家/地区进行分区。任何提示将非常感谢。谢谢！

Answer 1

经过一些繁重的实验，我能够得到我想要的东西。我概述了下面的步骤

Groupby Country
```
group = df.groupby("Country")
```
在高层，这表明我们希望以不同的方式看待每个国家。现在我们的目标是确定前3个指标计数并报告相应的渠道。为此，我们将对结果数据框应用排序，然后仅返回前3个结果。我们可以通过定义一个sort函数来做到这一点，该函数只返回前3个结果并在pandas中使用apply函数。这向熊猫表明“我想将此排序功能应用于我们的每个组并返回每组的前3个结果”。

排序并返回前三名

sort_function = lambda x: x.sort("metric_count", ascending = False)[:3]
desired_df = group.apply(sort_function)

Answer 2

使用groupby/apply分别对每个组进行排序，然后选择前三行：

def top_three(grp):
    grp.sort(ascending=False)
    return grp[:3]
df = df.set_index(['channel'])
result = df.groupby('Country', group_keys=False).apply(top_three)

例如，

import numpy as np
import pandas as pd
np.random.seed(2015)
N = 100
df = pd.DataFrame({
    'Country': np.random.choice(['Country{}'.format(i) for i in range(3)], size=N),
    'channel': np.random.choice(['channel{}'.format(i) for i in range(4)], size=N),
    'metric_count': np.random.randint(100, size=N)
})

def top_three(grp):
    grp.sort(ascending=False)
    return grp[:3]

df = df.set_index(['channel'])
result = df.groupby('Country', group_keys=False).apply(top_three)
result = result.set_index(['Country'], append=True)
result = result.reorder_levels(['Country', 'channel'], axis=0)
print(result)

产量

                   metric_count
Country  channel               
Country0 channel3            93
         channel3             0
         channel1             5
Country1 channel0            46
         channel2            86
         channel2            41
Country2 channel0             4
         channel0            51
         channel3            36

Python Pandas按列排序，但保持索引相同

2 个答案: