sklearn LabelEncoder inverse_transform TypeError:仅整数标量数组可以转换为标量索引

时间:2018-10-10 16:46:27

标签: scikit-learn python-3.6

调用LabelEncoder的inverse_transform时出现以下错误:

Traceback (most recent call last):
  File "Test.py", line 31, in <module>
    inverted = label_encoder.inverse_transform(integer_encoded['DEST'])
  File "...\Python\Python36\lib\site-packages\sklearn\preprocessing\label.py", line 283, in inverse_transform
    return self.classes_[y]
TypeError: only integer scalar arrays can be converted to a scalar index

产生此错误的代码如下:

import pandas as pd
import numpy as np
from collections import defaultdict
from sklearn import preprocessing
import bisect
data_cat = {'ORG': ['A', 'B', 'C', 'D'],
            'DEST': ['A', 'E', 'F', 'G'],
            'OP': ['F1', 'F1', 'F1', 'F2']}
data_cat = pd.DataFrame(data_cat)

#retain all columns LabelEncoder as dictionary.
label_encoder_dict = defaultdict(preprocessing.LabelEncoder) 
integer_encoded = data_cat.apply(lambda x: label_encoder_dict[x.name].fit_transform(x))
print("Integer encoded: ")
print(integer_encoded)

#add a UNK class that will be used for the unseen values from the test dataset
for key, le in label_encoder_dict.items():
    le_classes = np.array(le.classes_).tolist()
    bisect.insort_left(le_classes, 'UNK')
    le.classes_ = le_classes

label_encoder = label_encoder_dict['DEST']
print(label_encoder.classes_)
print(integer_encoded['DEST'])
print(type (integer_encoded['DEST']))
inverted = label_encoder.inverse_transform(integer_encoded['DEST'])
print(inverted)

如果删除将UNK类添加到每个LabelEncoder的for循环,则一切工作正常。我不明白为什么添加新类会影响inverse_transform的调用。

感谢您的帮助或指导。

1 个答案:

答案 0 :(得分:1)

LabelEncoder.inverse_transform实际上很简单。 LabelEncoder对象在classes_属性中存储原始值的数组,并且编码的整数是classes_中该值的索引。通常,classes_np.array类型,它支持传递索引列表以获取这些索引处的值。但是,在您的for循环中,您将其转换为不支持该行为的常规旧python列表。

如果您更改for循环以将le.classes_保留为ndarray,则它应该起作用:

for key, le in label_encoder_dict.items():
    le_classes = np.array(le.classes_).tolist()
    bisect.insort_left(le_classes, 'UNK')
    le.classes_ = np.asarray(le_classes)