我对mutate和自写函数有疑问。我的数据基本上如下:
license_sets <- list(x = c("A", "B"), y = c("C", "D", "E"))
license_data <- data.frame(license = c("A","B","C","D","E"), bidder = c("x","x","y","y","y"))
source_data <- expand.grid(license_i = c("A","B","C","D","E"), license_j = c("A","B","C","D","E"))
source_data$value <- c(1:25)
我要应用的功能如下:
compute_set <- function(i, J){ tmp <- source_data %>%
filter(license_i == i, license_j %in% J)
return(sum(tmp$value))
}
我现在想通过mutate应用该功能:
license_data %>% mutate(z = compute_set(license, license_sets[[bidder]]))
我收到以下错误和警告消息:
Error in mutate_impl(.data, dots) :
Evaluation error: Evaluation error: recursive indexing failed at level 2
..
In addition: Warning messages:
1: In is.na(e1) | is.na(e2) :
longer object length is not a multiple of shorter object length
2: In `==.default`(license_i, i) :
longer object length is not a multiple of shorter object length
如果我通过一个简单的for循环运行相同的函数,那么它将完全正常。有谁知道这里的问题是什么?它必须与mutate有关,对吗?我也已经尝试过as.character(bidder)和我在这里找到的其他东西,但到目前为止没有任何效果。 我应该补充一点,我正在处理的数据帧比我在此处显示的数据帧大得多,因此for循环是不可行的……(因此,我也很感谢该函数的简化提示;))
答案 0 :(得分:0)
问题在于,在mutate
中,参数始终作为整个向量传递,如您在此处看到的那样:
license_data %>% mutate(z = {print(list(bidder, license));
compute_set(license, license_sets[[bidder]])})
# [[1]]
# [1] x x y y y
# Levels: x y
# [[2]]
# [1] A B C D E
# Levels: A B C D E
# Error in license_sets[[bidder]] : recursive indexing failed at level 2
以这种方式为列表建立索引不起作用:
license_sets[[license_data$bidder]]
# Error in license_sets[[license_data$bidder]] :
# recursive indexing failed at level 2
因此,您想通过向量map
:
license_data %>%
mutate(z = map2(bidder, license, ~ compute_set(.y, license_sets[[.x]])))
向量化
正如@ [docendo discimus]所指出的那样,您的函数存在的问题是它没有向量化,即,它只处理标量(在i
的情况下)。您可以向量化功能以按预期使用它:
compute_set_v <- Vectorize(compute_set)
license_data %>%
## add the list content directly to the data frame
mutate(bidder_set = map(bidder, ~ license_sets[[.]]),
z = compute_set_v(license, bidder_set))
注意
data.frame
有一个讨厌的习惯,将字符串视为因素,因此您可能想在stringsAsFactors = FALSE
结构中添加data.frame
。