在Pandas中使用groupby获取行数

时间:2014-03-08 09:00:20

标签: python-3.x numpy pandas

我的数据集中有两列col1col2。我想显示按col1分组的数据。

为此,我编写了如下代码:

grouped = df[['col1','col2']].groupby(['col1'], as_index= False)

上面的代码创建了groupby对象。

如何使用该对象显示按col1分组的数据?

2 个答案:

答案 0 :(得分:5)

要按group获取计数,您可以使用dataframe.groupby('column').size()

示例:

In [10]:df = pd.DataFrame({'id' : [123,512,'zhub1', 12354.3, 129, 753, 295, 610],
                    'colour': ['black', 'white','white','white',
                            'black', 'black', 'white', 'white'],
                    'shape': ['round', 'triangular', 'triangular','triangular','square',
                                        'triangular','round','triangular']
                    },  columns= ['id','colour', 'shape'])

In [11]:df
Out[11]:
     id    colour   shape
0    123     black   round
1    512     white   triangular
2    zhub1   white   triangular
3    12354.3 white   triangular
4    129     black   square
5    753     black   triangular
6    295     white   round
7    610     white   triangular


In [12]:df.groupby('colour').size()
Out[12]:
        colour
        black     3
        white     5
        dtype: int64

In [13]:df.groupby('shape').size()
Out[13]:
        shape
        round         2
        square        1
        triangular    5
        dtype: int64

答案 1 :(得分:1)

尝试groups返回的对象的get_group()属性和groupby()方法:

>>> import numpy as np
>>> import pandas as pd
>>> anarray=np.array([[0, 31], [1, 26], [0, 35], [1, 22], [0, 41]])
>>> df = pd.DataFrame(anarray, columns=['is_female', 'age'])
>>> by_gender=df[['is_female','age']].groupby(['is_female'])
>>> by_gender.groups # returns indexes of records
{0: [0, 2, 4], 1: [1, 3]}
>>> by_gender.get_group(0)['age'] # age of males
0    31
2    35
4    41
Name: age, dtype: int64