我正在阅读一本关于使用Python的机器学习简介的书。这里作者描述如下 让我们说工作类功能,我们有可能的政府价值观 员工","私人雇员","自雇人员"和"自雇企业 泰德"
print("Original features:\n", list(data.columns), "\n")
data_dummies = pd.get_dummies(data)
print("Features after get_dummies:\n", list(data_dummies.columns))
Original features:
['age', 'workclass']
Features after get_dummies:
['age', 'workclass_ ?', 'workclass_ Government Employee', 'workclass_Private Employee', 'workclass_Self Employed', 'workclass_Self Employed Incorporated']
我的问题是什么是新列workclass_?
答案 0 :(得分:3)
使用列count = ''.join(sentList).count(userLetter)
的字符串值创建:
workclass
data = pd.DataFrame({'age':[1,1,1,2,1,1],
'workclass':['Government Employee','Private Employee','Self Employed','Self Employed Incorpora ted','Self Employed Incorpora ted','?']})
print (data)
age workclass
0 1 Government Employee
1 1 Private Employee
2 1 Self Employed
3 2 Self Employed Incorpora ted
4 1 Self Employed Incorpora ted
5 1 ?
如果有多个具有相同值的列,则此前缀非常有用:
data_dummies = pd.get_dummies(data)
print (data_dummies)
age workclass_? workclass_Government Employee \
0 1 0 1
1 1 0 0
2 1 0 0
3 2 0 0
4 1 0 0
5 1 1 0
workclass_Private Employee workclass_Self Employed \
0 0 0
1 1 0
2 0 1
3 0 0
4 0 0
5 0 0
workclass_Self Employed Incorpora ted
0 0
1 0
2 0
3 1
4 1
5 0
如果不需要,可以添加参数以用空格覆盖它:
data = pd.DataFrame({'age':[1,1,3],
'workclass':['Government Employee','Private Employee','?'],
'workclass1':['Government Employee','Private Employee','Self Employed']})
print (data)
age workclass workclass1
0 1 Government Employee Government Employee
1 1 Private Employee Private Employee
2 3 ? Self Employed
data_dummies = pd.get_dummies(data)
print (data_dummies)
age workclass_? workclass_Government Employee \
0 1 0 1
1 1 0 0
2 3 1 0
workclass_Private Employee workclass1_Government Employee \
0 0 1
1 1 0
2 0 0
workclass1_Private Employee workclass1_Self Employed
0 0 0
1 1 0
2 0 1
然后列可能data_dummies = pd.get_dummies(data, prefix='', prefix_sep='')
print (data_dummies)
age ? Government Employee Private Employee Government Employee \
0 1 0 1 0 1
1 1 0 0 1 0
2 3 1 0 0 0
Private Employee Self Employed
0 0 0
1 1 0
2 0 1
,每个唯一列可以汇总groupby
个虚拟对象:
max