对Sagemaker RecordIO格式的标签使用numpy.ndarray类型(多标签)?

时间:2018-08-06 14:43:45

标签: python mxnet amazon-sagemaker

我正在尝试编写numpy.ndarray作为Amazon Sagemaker转换工具的标签:write_numpy_to_dense_tensor()。它将功能和标签的数个数组转换为RecordIO,以更好地利用Sagemaker算法。

但是,如果我尝试传递标签的多标签输出,则会收到一条错误消息,指出它只能是矢量(即每个要素行的标量)。

标签中是否可以有多个值?这对于使用XGBoost,Random Forests,Neural Networks等可以实现的多维回归很有用。

代码

import sagemaker.amazon.common as smac
print("Types: {}, {}".format(type(X_train), type(y_train)))
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, X_train.astype('float32'), y_train.astype('float32'))

输出:

Types: <class 'numpy.ndarray'>, <class 'numpy.ndarray'>
X_train shape: (9919, 2684)
y_train shape: (9919, 20)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-fc1033b7e309> in <module>()
      3 print("y_train shape: {}".format(y_train.shape))
      4 f = io.BytesIO()
----> 5 smac.write_numpy_to_dense_tensor(f, X_train.astype('float32'), y_train.astype('float32'))

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/amazon/common.py in write_numpy_to_dense_tensor(file, array, labels)
     94     if labels is not None:
     95         if not len(labels.shape) == 1:
---> 96             raise ValueError("Labels must be a Vector")
     97         if labels.shape[0] not in array.shape:
     98             raise ValueError("Label shape {} not compatible with array shape {}".format(

ValueError: Labels must be a Vector

1 个答案:

答案 0 :(得分:1)

Tom,XGBoost不支持RecordIO格式。它仅支持csv和libsvm。另外,该算法本身并不原生支持多标签。但是有几种解决方法:Xg boost for multilabel classification?

Random Cut Forest也不支持多个标签。如果提供了多个标签,则它将仅拾取第一个标签。