我有一个Spark DataFrame,train_tbl
,包含48个不同的列。我想使用sparklyr
包训练随机森林模型,该包使用48列中的一列作为response
变量,所有其他列作为features
。有没有办法指定我想使用除loan_status
之外的所有字段作为功能,而不必输入所有47个字段名称?
ml_random_forest(x = train_tbl,
response = "loan_status",
features = call all fields EXCEPT "loan_status",
num.trees = 10L,
type = "classification")
答案 0 :(得分:1)
这应该有用。
ml_random_forest(x = train_tbl,
response = "loan_status",
features = names(train_tbl)[which(names(train_tbl)!="loan_status")],
num.trees = 10L,
type = "classification")