我正在处理中等数据集(train_data)。有124个变量和50,00,000个观测值。对于分类变量,我通过R中的hashed.model.matrix函数对其进行了特征哈希。
## feature hashing
b <- 2 ^ 22
f <- ~ .-1
X_train <- hashed.model.matrix(f, train_data, hash.size=b)
因此,我得到了一个大的dgCmatrix(一个稀疏矩阵)作为输出(X_train)。我怎样才能在这个矩阵上使用H2o包装器并使用H2o中提供的不同算法? H2o包装器是否采用稀疏矩阵(dgCmatrix)。这种用法的任何链接/示例都会有所帮助。感谢您的期待。
期待在H2o环境中导入X_train以实现多种步骤
# initialize connection to H2O server
h2o.init(nthreads = -1)
train.hex <- h2o.uploadFile('./X_train', destination_frame='train')
# list of features for training
feature.names <- names(train.hex)
# train random forest model, use ntrees = 500
drf <- h2o.randomForest(x=feature.names, y='outcome', training_frame,train.hex, ntrees =500)
答案 0 :(得分:2)
您可以将稀疏矩阵保存为svmlight稀疏格式,然后使用
train.hex <- h2o.uploadFile('./X_train', parse_type = "SVMLight", destination_frame='train')
轻量级格式也将由h2o.importFile()
检测到,train.hex <- h2o.importFile('./X_train', destination_frame='train')
是一个并行读取器,从客户端指定的位置从服务器提取信息。
public static class Reflection
{
/// <summary>
/// Extension for 'Object' that copies the properties to a destination object.
/// </summary>
/// <param name="source">The source.</param>
/// <param name="destination">The destination.</param>
public static void CopyProperties(this object source, object destination)
{
// If any this null throw an exception
if (source == null || destination == null)
throw new Exception("Source or/and Destination Objects are null");
// Getting the Types of the objects
Type typeDest = destination.GetType();
Type typeSrc = source.GetType();
// Iterate the Properties of the source instance and
// populate them from their desination counterparts
PropertyInfo[] srcProps = typeSrc.GetProperties();
foreach (PropertyInfo srcProp in srcProps)
{
if (!srcProp.CanRead)
{
continue;
}
PropertyInfo targetProperty = typeDest.GetProperty(srcProp.Name);
if (targetProperty == null)
{
continue;
}
if (!targetProperty.CanWrite)
{
continue;
}
if (targetProperty.GetSetMethod(true) != null && targetProperty.GetSetMethod(true).IsPrivate)
{
continue;
}
if ((targetProperty.GetSetMethod().Attributes & MethodAttributes.Static) != 0)
{
continue;
}
if (!targetProperty.PropertyType.IsAssignableFrom(srcProp.PropertyType))
{
continue;
}
// Passed all tests, lets set the value
targetProperty.SetValue(destination, srcProp.GetValue(source, null), null);
}
}
}