Question

我很困惑为什么下面的代码不起作用。我使用AgeBands函数创建了列pd.cut，因此类型是类别。从理论上讲，我应该像在字符串列上一样对它进行子集化，但是当我尝试时，结果数据帧new_df的行为零。我错过了什么？

import numpy as np
import pandas as pd

df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])

new_df = df[df['AgeBands'] == '(30-40]']
new_df.shape

运行df.info()时，我确认AgeBands确实属于类别类别：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
Age         6 non-null int64
AgeBands    6 non-null category
dtypes: category(1), int64(1)
memory usage: 174.0 bytes

Answer 1

你拼错了df中的内容，它是'(30, 40]'，而不是'(30-40]'

import numpy as np
import pandas as pd

df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])

new_df = df[df['AgeBands'] == '(30, 40]']
new_df

输出

    Age AgeBands
1   38  (30, 40]
3   35  (30, 40]
4   35  (30, 40]

Answer 2

为了更好地理解，您可以为铲斗范围设置标签。

df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])], labels=range(1,7))

输出：

  Age AgeBands
0   22        3
1   38        4
2   26        3
3   35        4
4   35        4
5   65        6

找到df[df['AgeBands'] == 3]

 Age AgeBands
0   22        3
2   26        3

类型为= category

2 个答案: