我很困惑为什么下面的代码不起作用。我使用AgeBands
函数创建了列pd.cut
,因此类型是类别。从理论上讲,我应该像在字符串列上一样对它进行子集化,但是当我尝试时,结果数据帧new_df
的行为零。我错过了什么?
import numpy as np
import pandas as pd
df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])
new_df = df[df['AgeBands'] == '(30-40]']
new_df.shape
运行df.info()
时,我确认AgeBands
确实属于类别类别:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 2 columns):
Age 6 non-null int64
AgeBands 6 non-null category
dtypes: category(1), int64(1)
memory usage: 174.0 bytes
答案 0 :(得分:2)
你拼错了df中的内容,它是'(30, 40]'
,而不是'(30-40]'
import numpy as np
import pandas as pd
df = pd.DataFrame({'Age' : [22, 38, 26, 35, 35, 65]})
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])])
new_df = df[df['AgeBands'] == '(30, 40]']
new_df
输出
Age AgeBands
1 38 (30, 40]
3 35 (30, 40]
4 35 (30, 40]
答案 1 :(得分:1)
为了更好地理解,您可以为铲斗范围设置标签。
df['AgeBands'] = pd.cut(df['Age'], [0,10,20,30,40,50,max(df['Age'])], labels=range(1,7))
输出:
Age AgeBands
0 22 3
1 38 4
2 26 3
3 35 4
4 35 4
5 65 6
找到df[df['AgeBands'] == 3]
Age AgeBands
0 22 3
2 26 3