当尝试在我的机器学习项目中实现OneHotEncoding时,我正在使用以下代码对我的3个类别特征(工作,婚姻状况和教育程度)进行编码
encoder = OneHotEncoder(categories = 'auto')
feature_array = encoder.fit_transform(df[['job', 'marital', 'education']]).toarray()
feature_labels = encoder.categories_
这会将3个要素中的每个要素的类别返回到列表中捕获的3个不同的数组中。
[array(['admin.', 'blue-collar', 'management', 'retired', 'self-employed',
'services', 'student', 'technician', 'unemployed', 'unknown'],
dtype=object),
array(['divorced', 'married', 'single'], dtype=object),
array(['primary', 'secondary', 'tertiary', 'unknown'], dtype=object)]
我知道在此列表中使用for循环可以返回3个列表,其中包含所有3个功能的标签,
for value in feature_labels:
print(value)
['admin.' 'blue-collar' 'management' 'retired' 'self-employed' 'services'
'student' 'technician' 'unemployed' 'unknown']
['divorced' 'married' 'single']
['primary' 'secondary' 'tertiary' 'unknown']
话虽这么说,我是否可以合并使用一个更优雅的衬里或一个衬里来创建包含我的3个功能的所有不同类别的列表?最后,我希望有一个看起来像下面的列表,这样我就可以将所有3种编码功能通过管道传输到单个数据框中,
['admin.', 'blue-collar', 'management', 'retired', 'self-employed', 'services', 'student' ,'technician', 'unemployed', 'unknown', 'divorced', 'married', 'single', 'primary', 'secondary', 'tertiary', 'unknown']
答案 0 :(得分:1)
您可以使用numpy的串联来连接您的3个数组:(https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html)
labels = np.concatenate(feature_labels)
# The result:
array(['admin.', 'blue-collar', 'management', 'retired', 'self-employed',
'services', 'student', 'technician', 'unemployed', 'unknown',
'divorced', 'married', 'single', 'primary', 'secondary',
'tertiary', 'unknown'], dtype=object)
答案 1 :(得分:0)
如果您有嵌套列表:
l = [['admin.', 'blue-collar', 'management', 'retired', 'self-employed','services', 'student', 'technician', 'unemployed', 'unknown'],\
['divorced', 'married', 'single'], ['primary', 'secondary', 'tertiary', 'unknown']]
取消嵌套的方法之一是:
import itertools
flat_l = list(itertools.chain(*l))
结果:
['admin.',
'blue-collar',
'management',
'retired',
'self-employed',
'services',
'student',
'technician',
'unemployed',
'unknown',
'divorced',
'married',
'single',
'primary',
'secondary',
'tertiary',
'unknown']
答案 2 :(得分:0)
由于您有一个numpy数组列表,因此您也可以使用:
import numpy as np
l = list(np.concatenate(feature_labels))