我正在尝试为多个列创建虚拟变量,如:
- 性别(1 =男性; 2 =女性)
- 教育(1 =研究生院; 2 =大学; 3 =高中; 4 =其他)
- 婚姻状况(1 =已婚; 2 =单身; 3 =其他)
- 违约者(1 =默认,0 =无违约)
有人可以建议如何去做吗?
答案 0 :(得分:0)
假设您有数据'像这样:
type_description
然后您可以使用pd.Series.apply()来应用编码 例如
Education Gender MarritalStatus
0 graduate school male married
1 university female single
2 high school female other
3 others male single
4 university male single
结果:
def enc_for_gender(x):
if x == 'male':
return 1
return 2
def enc_for_education(x):
if x == 'graduate school':
return 1
elif x == 'university':
return 2
elif x == 'high school':
return 3
return 4
data['Gender'].apply(enc_for_gender)
教育相同
0 1
1 2
2 2
3 1
4 1
Name: Gender, dtype: int64
结果:
data['Education'].map(enc_for_education)
其他人相同
答案 1 :(得分:0)
您只需将字典用作键值:
Gender= {1: "male",2 : "female"}
Education = {1 :"graduate school", 2 : "university", 3 : "high school", 4 : "others"}
如果可能的话,建议你在字典中使用字符串作为键,然后就可以使用
Gender= {"male":1, "female":2}
或将计数作为字符串
Gender= {"1": "male","2" : "female"}