Question

数据部分如下：{60 1,248 1,279 1,316 1}。当我使用Python LIAC-ARFF库时，我得到如下错误：ValueError: {60 1 value not in ('0', '1')。

当我使用普通的ARFF文件时，它可以正常工作。

我正在使用来自MULAN网站的着名的delicious.arff数据集。

我还需要使用其他方法吗？有人可以帮忙吗？

Answer 1

您可以使用function scikit-multilearn provides for loading ARFF data。

如何使用的示例 - 第一个参数是ARFF文件，格式为MULAN，因此标签位于末尾（小端）。美味数据集中有983个标签，美味输入数据的特征是整数，输入数据已经是名义上的，因为美味的输入空间是一个单词包。请记住，您应该始终阅读相关论文中数据集的内容（数据集的源文件信息在MULAN网站上提供）：

from skmultilearn.dataset import load_from_arff

X, y = load_from_arff("/home/user/data/delicious-train.arff", 
    # number of labels
    labelcount=983, 
    # MULAN format, labels at the end of rows in arff data
    endian='little', 
    # bag of words
    input_feature_type='int', encode_nominal=False, 
    # sometimes the sparse ARFF loader is borked, like in delicious,
    # scikit-multilearn converts the loaded data to sparse representations, 
    # so disabling the liac-arff sparse loader
    load_sparse=False, 
    # this decides whether to return attribute names or not, usually 
    # you don't need this
    return_attribute_definitions=False)

返回什么？

>>> print(X, y)
(<12920x500 sparse matrix of type '<type 'numpy.int64'>' with 6460000 stored elements in LInked List format>,
<12920x983 sparse matrix of type '<type 'numpy.int64'>' with 12700360 stored elements in LInked List format>)

如何使用Python库读取稀疏ARFF数据？

1 个答案: