sklearn - 无法立即调用MultiLabelBinarizer的inverse_transform

时间:2016-06-09 09:37:49

标签: python python-2.7 scikit-learn

在实现MultiLabelBinarizer实例化后,我需要inverse_transform方法来获取我在别处构建的矩阵。 不幸的是,

import numpy as np
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=['a', 'b', 'c'])

A = np.array([[1, 0, 0], [1, 0, 1], [0, 1, 0], [1, 1, 1]])
y = mlb.inverse_transform(A)

收益AttributeError: 'MultiLabelBinarizer' object has no attribute 'classes_'

我注意到,如果我在mlb

的实例化后添加此行
mlb.fit_transform([(c,) for c in ['a', 'b', 'c']])

错误消失。我猜这是因为fit_transform设置了classes_属性的值,但我希望它可以在实例时完成,因为我提供了classes参数。

我使用的是sklearn版本0.17.1和python 2.7.6。 我做错了吗?

2 个答案:

答案 0 :(得分:3)

如果你想在classes_的实例中设置属性MultiLabelBinarizer,你也可以像这样快速破解:

mlb = MultiLabelBinarizer().fit(['a', 'b', 'c'])

因为像marmouset所说,只有fitfit_transorm似乎符合classes_属性。此外,scikit-learn.org http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html的文档明确指出方法fit可以返回MultiLabelBinarizer的实例。

def fit(self, y):
    """Fit the label sets binarizer, storing `classes_`
    Parameters
    ----------
    y : iterable of iterables
        A set of labels (any orderable and hashable object) for each
        sample. If the `classes` parameter is set, `y` will not be
        iterated.
    Returns
    -------
    self : returns this MultiLabelBinarizer instance
    """

答案 1 :(得分:2)

它似乎按https://github.com/scikit-learn/scikit-learn/blob/51a765a/sklearn/preprocessing/label.py#L636实现,.fit是唯一定义classes_属性的方法。  classes_没有被定义为构造函数中的类的副本,并且考虑到注释中给出的定义,它并不是这样的;你可以警告作者。

class MultiLabelBinarizer(BaseEstimator, TransformerMixin):
    """Transform between iterable of iterables and a multilabel format
    Although a list of sets or tuples is a very intuitive format for multilabel
    data, it is unwieldy to process. This transformer converts between this
    intuitive format and the supported multilabel format: a (samples x classes)
    binary matrix indicating the presence of a class label.
    Parameters
    ----------
    classes : array-like of shape [n_classes] (optional)
        Indicates an ordering for the class labels
    sparse_output : boolean (default: False),
        Set to true if output binary array is desired in CSR sparse format
    Attributes
    ----------
    classes_ : array of labels
        A copy of the `classes` parameter where provided,
        or otherwise, the sorted set of classes found when fitting.