Question

编程新手，如果这真的很简单，请原谅我

我知道我们应该可以使用列表对大熊猫进行分组，而且它们的长度必须相等，但是以某种方式我无法使其工作？

使用seaborn的Titanic数据集
定义年龄段的功能

def age_groups(x):
    array = []
    for i in x['age']:
        if(math.isnan(i)):
            array.append(9)
        if(i < 20):
            array.append(1)
        if(i < 40):
            array.append(2)
        if(i < 60):
            array.append(3)
        else:
            array.append(4)
    return array

groups = age_groups(titanic)
titanic.groupby(groups).mean()

我收到以下错误

文件“ pandas \ hashtable.pyx”，第683行，在   pandas.hashtable.PyObjectHashTable.get_item（pandas \ hashtable.c：12322）

KeyError：2

预先感谢

Answer 1

您需要确保传递给groupby函数的变量包含在数据框中：

import seaborn as sns
import numpy as np

titanic = sns.load_dataset('titanic')

titanic['groups'] = titanic['age']
titanic.loc[np.isnan(titanic.age), 'groups'] = 9
titanic.loc[titanic.age >= 60, 'groups'] = 4
titanic.loc[titanic.age < 60, 'groups'] = 3
titanic.loc[titanic.age < 40, 'groups'] = 2
titanic.loc[titanic.age < 20, 'groups'] = 1
titanic.groupby('groups').mean()


        survived    pclass        age  ...       fare  adult_male     alone
groups                                 ...                                 
1.0     0.481707  2.530488  11.979695  ...  31.794741    0.298780  0.329268
2.0     0.387597  2.304910  28.580103  ...  32.931200    0.658915  0.653747
3.0     0.394161  1.824818  47.354015  ...  41.481784    0.635036  0.569343
4.0     0.269231  1.538462  65.096154  ...  43.467950    0.846154  0.730769
9.0     0.293785  2.598870        NaN  ...  22.158567    0.700565  0.751412

[5 rows x 8 columns]

Answer 2

有一种更简单的方法来获取年龄组，即使用numpy.digitize，它根据值所属的bin返回一个整数，分别为0和len(bins)（ 5）分别处于欠载和溢出状态。 NaN似乎陷入了困境（因为它们的比较值不小于任何数字）。

groups = np.digitize(titanic.age, [0, 20, 40, 60, titanic.age.max() + 1])
titanic.groupby(groups).age.mean()
# 1    11.979695
# 2    28.580103
# 3    47.354015
# 4    65.096154
# 5          NaN
# Name: age, dtype: float64

Python Pandas Groupby-by = list给我一个错误？

2 个答案: