sparklyr:在训练模型

时间:2017-09-22 14:50:54

标签: r sparklyr

我有一个Spark DataFrame,train_tbl,包含48个不同的列。我想使用sparklyr包训练随机森林模型,该包使用48列中的一列作为response变量,所有其他列作为features。有没有办法指定我想使用除loan_status之外的所有字段作为功能,而不必输入所有47个字段名称?

ml_random_forest(x = train_tbl,
                 response = "loan_status", 
                 features = call all fields EXCEPT "loan_status",
                 num.trees = 10L,
                 type = "classification")

1 个答案:

答案 0 :(得分:1)

这应该有用。

ml_random_forest(x = train_tbl,
                 response = "loan_status", 
                 features = names(train_tbl)[which(names(train_tbl)!="loan_status")],
                 num.trees = 10L,
                 type = "classification")