类型为= category

时间:2017-06-23 10:07:50

标签: python pandas

我很困惑为什么下面的代码不起作用。我使用AgeBands函数创建了列pd.cut,因此类型是类别。从理论上讲,我应该像在字符串列上一样对它进行子集化,但是当我尝试时,结果数据帧new_df的行为零。我错过了什么?

import numpy as np
import pandas as pd

df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])

new_df = df[df['AgeBands'] == '(30-40]']
new_df.shape

运行df.info()时,我确认AgeBands确实属于类别类别:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
Age         6 non-null int64
AgeBands    6 non-null category
dtypes: category(1), int64(1)
memory usage: 174.0 bytes

2 个答案:

答案 0 :(得分:2)

你拼错了df中的内容,它是'(30, 40]',而不是'(30-40]'

import numpy as np
import pandas as pd

df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])

new_df = df[df['AgeBands'] == '(30, 40]']
new_df

输出

    Age AgeBands
1   38  (30, 40]
3   35  (30, 40]
4   35  (30, 40]

答案 1 :(得分:1)

为了更好地理解,您可以为铲斗范围设置标签。

df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])], labels=range(1,7))

输出:

  Age AgeBands
0   22        3
1   38        4
2   26        3
3   35        4
4   35        4
5   65        6

找到df[df['AgeBands'] == 3]

 Age AgeBands
0   22        3
2   26        3