Python Pandas中的“分组/分类年龄”列

时间:2018-10-11 06:28:14

标签: python pandas dataframe

我有一个数据框,说dfdf有一列'Ages'

>>> df['Age']

Age Data

我想对这个年龄段进行分组并创建一个像这样的新列

If age >= 0 & age < 2 then AgeGroup = Infant
If age >= 2 & age < 4 then AgeGroup = Toddler
If age >= 4 & age < 13 then AgeGroup = Kid
If age >= 13 & age < 20 then AgeGroup = Teen
and so on .....

如何使用Pandas库实现这一目标。

我试图做这样的事情

X_train_data['AgeGroup'][ X_train_data.Age < 13 ] = 'Kid'
X_train_data['AgeGroup'][ X_train_data.Age < 3 ] = 'Toddler'
X_train_data['AgeGroup'][ X_train_data.Age < 1 ] = 'Infant'

但是这样做我得到这个警告

  

/Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:3:SettingWithCopyWarning:   试图在DataFrame的切片副本上设置一个值   请参阅文档中的警告:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy     这与ipykernel软件包分开,因此我们可以避免导入,直到   /Users/Anand/miniconda3/envs/learn/lib/python3.7/site-packages/ipykernel_launcher.py:4:SettingWithCopyWarning:   试图在DataFrame的切片副本上设置一个值

如何避免此警告并以更好的方式执行。

2 个答案:

答案 0 :(得分:3)

pandas.cut与参数right=False一起使用,以不包括垃圾箱的最右边:

X_train_data = pd.DataFrame({'Age':[0,2,4,13,35,-1,54]})

bins= [0,2,4,13,20,110]
labels = ['Infant','Toddler','Kid','Teen','Adult']
X_train_data['AgeGroup'] = pd.cut(X_train_data['Age'], bins=bins, labels=labels, right=False)
print (X_train_data)
   Age AgeGroup
0    0   Infant
1    2  Toddler
2    4      Kid
3   13     Teen
4   35    Adult
5   -1      NaN
6   54    Adult

最后一次使用add_categoriesfillna来替换缺失值:

X_train_data['AgeGroup'] = X_train_data['AgeGroup'].cat.add_categories('unknown')
                                                   .fillna('unknown')
print (X_train_data)
   Age AgeGroup
0    0   Infant
1    2  Toddler
2    4      Kid
3   13     Teen
4   35    Adult
5   -1  unknown
6   54    Adult

bins= [-1,0,2,4,13,20, 110]
labels = ['unknown','Infant','Toddler','Kid','Teen', 'Adult']
X_train_data['AgeGroup'] = pd.cut(X_train_data['Age'], bins=bins, labels=labels, right=False)

print (X_train_data)
   Age AgeGroup
0    0   Infant
1    2  Toddler
2    4      Kid
3   13     Teen
4   35    Adult
5   -1  unknown
6   54    Adult

答案 1 :(得分:1)

只需使用:

X_train_data.loc[(X_train_data.Age < 13),  'AgeGroup'] = 'Kid'