我正在创建一个拉动数据框并将factor
变量传播到新的虚拟变量的函数,因为某些机器学习算法无法处理因子。为此,我在清理功能中使用spread()
功能。
当我尝试传递我需要传播的列的名称时,它会抛出错误:
Error: Invalid column specification
以下是代码:
library(tidyr)
library(dplyr)
library(C50) # this is one source for the churn data
data(churn)
f <- function(df, name) {
df$dummy <- c(1:nrow(df)) # create dummy variable with unique values
df <- spread(df, key <- as.character(substitute(name)), "dummy", fill = 0 )
}
churnTrain = f(churnTrain, name = "state")
str(churnTrain)
当然,如果我用key = as.character(substitute(name))
替换key = "state"
,它的工作正常,但整个函数失去了它的可重用性。
如何将列名传递给内部函数而不出错?
答案 0 :(得分:0)
您需要使用tidyverse
吗?
如果没有,您可以尝试旧的reshape2
包:
library(reshape2)
library(C50) # this is one source for the churn data
data(churn)
f <- function(df1, name) {
df1$dummy <- 1:nrow(df1) # create dummy variable with unique values
df1 <- dcast(df1, as.formula(paste0("dummy~", name)))
}
ct1 <- f(churnTrain, name = "state")
如果您绝对需要在tidyverse
工作,可以尝试按照http://dplyr.tidyverse.org/articles/programming.html上的教程进行操作。不幸的是,他们的例子在我的机器上不起作用。
答案 1 :(得分:0)
library(tidyr)
library(dplyr)
library(C50) # this is one source for the churn data
data(churn)
f <- function(df, name) {
df$dummy <- c(1:nrow(df)) # create dummy variable with unique values
df <- spread_(df, key = name, "dummy", fill = 0 )
}
churnTrain = f(churnTrain, name = "state")
str(churnTrain)