尝试为多列创建虚拟变量

时间:2018-04-10 01:11:02

标签: python

我正在尝试为多个列创建虚拟变量,如:

  
      
  • 性别(1 =男性; 2 =女性)
  •   
  • 教育(1 =研究生院; 2 =大学; 3 =高中; 4 =其他)
  •   
  • 婚姻状况(1 =已婚; 2 =单身; 3 =其他)
  •   
  • 违约者(1 =默认,0 =无违约)
  •   

有人可以建议如何去做吗?

2 个答案:

答案 0 :(得分:0)

假设您有数据'像这样:

type_description

然后您可以使用pd.Series.apply()来应用编码 例如

    Education         Gender    MarritalStatus
0   graduate school   male      married
1   university        female    single
2   high school       female    other
3   others            male      single
4   university        male      single

结果:

def enc_for_gender(x):
if x == 'male':
    return 1
return 2

def enc_for_education(x):
    if x == 'graduate school':
        return 1
    elif x == 'university':
        return 2
    elif x == 'high school':
        return 3
    return 4

data['Gender'].apply(enc_for_gender)

教育相同

0    1
1    2
2    2
3    1
4    1
Name: Gender, dtype: int64

结果:

data['Education'].map(enc_for_education)

其他人相同

答案 1 :(得分:0)

您只需将字典用作键值:

Gender= {1: "male",2 : "female"}
Education = {1 :"graduate school", 2 : "university", 3 : "high school", 4 : "others"}

如果可能的话,建议你在字典中使用字符串作为键,然后就可以使用

Gender= {"male":1, "female":2}

或将计数作为字符串

Gender= {"1": "male","2" : "female"}