调用LabelEncoder的inverse_transform时出现以下错误:
Traceback (most recent call last):
File "Test.py", line 31, in <module>
inverted = label_encoder.inverse_transform(integer_encoded['DEST'])
File "...\Python\Python36\lib\site-packages\sklearn\preprocessing\label.py", line 283, in inverse_transform
return self.classes_[y]
TypeError: only integer scalar arrays can be converted to a scalar index
产生此错误的代码如下:
import pandas as pd
import numpy as np
from collections import defaultdict
from sklearn import preprocessing
import bisect
data_cat = {'ORG': ['A', 'B', 'C', 'D'],
'DEST': ['A', 'E', 'F', 'G'],
'OP': ['F1', 'F1', 'F1', 'F2']}
data_cat = pd.DataFrame(data_cat)
#retain all columns LabelEncoder as dictionary.
label_encoder_dict = defaultdict(preprocessing.LabelEncoder)
integer_encoded = data_cat.apply(lambda x: label_encoder_dict[x.name].fit_transform(x))
print("Integer encoded: ")
print(integer_encoded)
#add a UNK class that will be used for the unseen values from the test dataset
for key, le in label_encoder_dict.items():
le_classes = np.array(le.classes_).tolist()
bisect.insort_left(le_classes, 'UNK')
le.classes_ = le_classes
label_encoder = label_encoder_dict['DEST']
print(label_encoder.classes_)
print(integer_encoded['DEST'])
print(type (integer_encoded['DEST']))
inverted = label_encoder.inverse_transform(integer_encoded['DEST'])
print(inverted)
如果删除将UNK类添加到每个LabelEncoder的for循环,则一切工作正常。我不明白为什么添加新类会影响inverse_transform的调用。
感谢您的帮助或指导。
答案 0 :(得分:1)
LabelEncoder.inverse_transform
实际上很简单。 LabelEncoder对象在classes_
属性中存储原始值的数组,并且编码的整数是classes_
中该值的索引。通常,classes_
是np.array
类型,它支持传递索引列表以获取这些索引处的值。但是,在您的for循环中,您将其转换为不支持该行为的常规旧python列表。
如果您更改for循环以将le.classes_
保留为ndarray,则它应该起作用:
for key, le in label_encoder_dict.items():
le_classes = np.array(le.classes_).tolist()
bisect.insort_left(le_classes, 'UNK')
le.classes_ = np.asarray(le_classes)