Question

我有一个Spark DataFrame，train_tbl，包含48个不同的列。我想使用sparklyr包训练随机森林模型，该包使用48列中的一列作为response变量，所有其他列作为features。有没有办法指定我想使用除loan_status之外的所有字段作为功能，而不必输入所有47个字段名称？

ml_random_forest(x = train_tbl,
                 response = "loan_status", 
                 features = call all fields EXCEPT "loan_status",
                 num.trees = 10L,
                 type = "classification")

Answer 1

这应该有用。

ml_random_forest(x = train_tbl,
                 response = "loan_status", 
                 features = names(train_tbl)[which(names(train_tbl)!="loan_status")],
                 num.trees = 10L,
                 type = "classification")

sparklyr：在训练模型

1 个答案: