我有11784条记录的数据分为测试(2946)和火车(8838)以运行h20算法,但得到了与我试图创建的数据框相关的错误,作为链接预测的最终输出和预测的ID。
此行错误:
df_y_test< - data.frame(ID = df_labels,Status = df_y_test $ predict)
data.frame出错(ID = df_labels,Status = df_y_test $ predict): 参数意味着行数不同:2946,2950
通过论坛查看并了解df_y_test中的行数是2950导致了这一点,但无法弄清楚为什么因为df_y_test也来自同一个'test'变量整体只有2946行 - 会请为任何指导感到高兴,请在下面发布完整的脚本以供参考
数据:117个变量的11784个障碍
测试:4546个变量2946个
火车:8838障碍46个变量
df_labels:2946个1变量的blind
df_y_test:4个变量2950个障碍
# Load Data
data <- read.csv('Data.csv')
# Partition Data
library(caTools)
set.seed(75)
split <- sample.split(data$Status, SplitRatio = 0.75)
train <- subset(data, split == TRUE)
test <- subset(data, split == FALSE)
# Dropping the column to be predicted from Test
test <- subset(test[,-c(2)])
library(readr)
library(h2o)
# Init h2o
localh2o <- h2o.init(max_mem_size = '2g', nthreads = -1)
# convert status values (to be predicted) in second column to factors in h2o
train[,2] <- as.factor(train[,2])
train_h2o <- as.h2o(train)
test_h2o <- as.h2o(test)
# Running H2O
model <- h2o.deeplearning(x=c(1, 3:46),
y=2,
training_frame = train_h2o,
activation = "RectifierWithDropout",
input_dropout_ratio = 0.2,
hidden_dropout_ratios = c(0.5, 0.5),
balance_classes = TRUE,
hidden = c(100,100),
nesterov_accelerated_gradient = T,
epochs = 15 )
h2o_y_test <- h2o.predict(model, test_h2o)
# Converting to data frames from h2o
df_y_test <- as.data.frame(h2o_y_test)
df_labels <- as.data.frame(test[,1])
df_y_test <- data.frame(ID = df_labels, Status = df_y_test$predict)
write.csv(df_y_test, file="predictionsH2o.csv", row.names = FALSE)
h2o.shutdown(prompt = FALSE)