熊猫-如何从提供的数据框中提取前三行

时间:2018-10-07 13:34:26

标签: python pandas pandas-groupby

我的熊猫数据框df可能产生如下结果:

grouped = df[(df['X'] == 'venture') & (df['company_code'].isin(['TDS','XYZ','UVW']))].groupby(['company_code','sector'])['X_sector'].count()

其输出如下:

company_code  sector                            
TDS           Meta                                 404
              Electrical                           333
              Mechanical                           533
              Agri                                 453
XYZ           Sports                               331
              Electrical                           354
              Movies                               375
              Manufacturing                        355            
UVW           Sports                               505
              Robotics                             345
              Movies                               56
              Health                               3263
              Manufacturing                        456
              Others                               524
Name: X_sector, dtype: int64

我想得到的是公司代码中的前三个部门。 怎么做?

2 个答案:

答案 0 :(得分:4)

您将必须在此处链接分组依据。考虑以下示例:

import pandas as pd
import numpy as np

np.random.seed(111)

names = [
    'Robert Baratheon',
    'Jon Snow',
    'Daenerys Targaryen',
    'Theon Greyjoy',
    'Tyrion Lannister'
]

df = pd.DataFrame({
    'season': np.random.randint(1, 7, size=100),
    'actor': np.random.choice(names, size=100),
    'appearance': 1
})

s = df.groupby(['season','actor'])['appearance'].count()
print(s.sort_values(ascending=False).groupby('season').head(1)) # <-- head(3) for 3 values

返回:

season  actor             
4       Daenerys Targaryen    7
6       Robert Baratheon      6
3       Robert Baratheon      6
5       Jon Snow              5
2       Theon Greyjoy         5
1       Jon Snow              4

s在哪里(固定为4)

season  actor             
1       Daenerys Targaryen    2
        Jon Snow              4
        Robert Baratheon      2
        Theon Greyjoy         3
        Tyrion Lannister      4
2       Daenerys Targaryen    4
        Jon Snow              3
        Robert Baratheon      1
        Theon Greyjoy         5
        Tyrion Lannister      3
3       Daenerys Targaryen    2
        Jon Snow              1
        Robert Baratheon      6
        Theon Greyjoy         3
        Tyrion Lannister      3
4 ...

答案 1 :(得分:0)

当有简单的代码可能时,你为什么要让事情变得复杂:

Z = df.groupby('country_code')['sector'].value_counts().groupby(level=0).head(3).sort_values(ascending=False).to_frame('counts').reset_index()

Z