age income student credit_rating Class_buys_computer
0 youth high no fair no
1 youth high no excellent no
2 middle_aged high no fair yes
3 senior medium no fair yes
4 senior low yes fair yes
5 senior low yes excellent no
6 middle_aged low yes excellent yes
7 youth medium no fair no
8 youth low yes fair yes
9 senior medium yes fair yes
10 youth medium yes excellent yes
11 middle_aged medium no excellent yes
12 middle_aged high yes fair yes
13 senior medium no excellent no
我正在使用此数据集,并希望将age
,income
等变量与factor variables
中的R
类似,如何在python中执行1} p>
答案 0 :(得分:1)
您可以将astype
与参数category
:
cols = ['age','income','student']
for col in cols:
df[col] = df[col].astype('category')
print (df.dtypes)
age category
income category
student category
credit_rating object
Class_buys_computer object
dtype: object
如果需要转换所有列:
for col in df.columns:
df[col] = df[col].astype('category')
print (df.dtypes)
age category
income category
student category
credit_rating category
Class_buys_computer category
dtype: object
你需要循环,因为如果使用:
df = df.astype('category')
NotImplementedError:>目前不支持1 ndim Categorical
Pandas documentation about categorical
通过评论编辑:
如果需要订购catagorical,请使用pandas.Categorical
的其他解决方案:
df['age']=pd.Categorical(df['age'],categories=["youth","middle_aged","senior"],ordered=True)
print (df.age)
0 youth
1 youth
2 middle_aged
3 senior
4 senior
5 senior
6 middle_aged
7 youth
8 youth
9 senior
10 youth
11 middle_aged
12 middle_aged
13 senior
Name: age, dtype: category
Categories (3, object): [youth < middle_aged < senior]
然后,您可以按列age
排序DataFrame:
df = df.sort_values('age')
print (df)
age income student credit_rating Class_buys_computer
0 youth high no fair no
1 youth high no excellent no
7 youth medium no fair no
8 youth low yes fair yes
10 youth medium yes excellent yes
2 middle_aged high no fair yes
6 middle_aged low yes excellent yes
11 middle_aged medium no excellent yes
12 middle_aged high yes fair yes
3 senior medium no fair yes
4 senior low yes fair yes
5 senior low yes excellent no
9 senior medium yes fair yes
13 senior medium no excellent no