如何在R中的特征散列矩阵上使用H2o

时间:2016-08-10 10:00:03

标签: r h2o

我正在处理中等数据集(train_data)。有124个变量和50,00,000个观测值。对于分类变量,我通过R中的hashed.model.matrix函数对其进行了特征哈希。

## feature hashing
b <- 2 ^ 22
f <- ~ .-1
X_train <- hashed.model.matrix(f, train_data, hash.size=b)

因此,我得到了一个大的dgCmatrix(一个稀疏矩阵)作为输出(X_train)。我怎样才能在这个矩阵上使用H2o包装器并使用H2o中提供的不同算法? H2o包装器是否采用稀疏矩阵(dgCmatrix)。这种用法的任何链接/示例都会有所帮助。感谢您的期待。

期待在H2o环境中导入X_train以实现多种步骤

# initialize connection to H2O server
  h2o.init(nthreads = -1)
 train.hex <- h2o.uploadFile('./X_train', destination_frame='train')

# list of features for training
feature.names <- names(train.hex)

# train random forest model, use ntrees = 500 
drf <- h2o.randomForest(x=feature.names, y='outcome', training_frame,train.hex, ntrees =500)

1 个答案:

答案 0 :(得分:2)

您可以将稀疏矩阵保存为svmlight稀疏格式,然后使用

train.hex <- h2o.uploadFile('./X_train', parse_type = "SVMLight", destination_frame='train')

轻量级格式也将由h2o.importFile()检测到,train.hex <- h2o.importFile('./X_train', destination_frame='train') 是一个并行读取器,从客户端指定的位置从服务器提取信息。

 public static class Reflection
{

    /// <summary>
    /// Extension for 'Object' that copies the properties to a destination object.
    /// </summary>
    /// <param name="source">The source.</param>
    /// <param name="destination">The destination.</param>
    public static void CopyProperties(this object source, object destination)
    {
        // If any this null throw an exception
        if (source == null || destination == null)
            throw new Exception("Source or/and Destination Objects are null");
        // Getting the Types of the objects
        Type typeDest = destination.GetType();
        Type typeSrc = source.GetType();

        // Iterate the Properties of the source instance and  
        // populate them from their desination counterparts  
        PropertyInfo[] srcProps = typeSrc.GetProperties();
        foreach (PropertyInfo srcProp in srcProps)
        {
            if (!srcProp.CanRead)
            {
                continue;
            }
            PropertyInfo targetProperty = typeDest.GetProperty(srcProp.Name);
            if (targetProperty == null)
            {
                continue;
            }
            if (!targetProperty.CanWrite)
            {
                continue;
            }
            if (targetProperty.GetSetMethod(true) != null && targetProperty.GetSetMethod(true).IsPrivate)
            {
                continue;
            }
            if ((targetProperty.GetSetMethod().Attributes & MethodAttributes.Static) != 0)
            {
                continue;
            }
            if (!targetProperty.PropertyType.IsAssignableFrom(srcProp.PropertyType))
            {
                continue;
            }
            // Passed all tests, lets set the value
            targetProperty.SetValue(destination, srcProp.GetValue(source, null), null);
        }
    }


}