我试图使用purrr
来应用过滤器和都基于另一个数据帧的值对变量进行突变。
# This is the original table
set.seed(100)
dfOriginal <- data.table(age = sample(10:60, 10))
# Following is the second data frame containing one variable which
# I would like to filter by - age criterion
# and then to mutate with - age band
dfAgeBands <- data.table(ageCriterion = c("age > 0 & age <= 20", "age > 20 & age <= 30"),
ageBand = c("Young Adults", "Adults"))
finalDf <- map2(dfAgeBands$ageCriterion, dfAgeBands$ageBand, function(x,y){dfOriginal[.x, ageBands := .y]})
编辑:刚刚更正了代码(它是为其他数据集构建的!) 但这仍然行不通。
根据ageCriterion
数据框中dfAgeBands
定义的规则,预期输出将类似于以下内容。
age ageBand
1: 56 <NA>
2: 51 <NA>
3: 41 <NA>
4: 36 <NA>
5: 44 <NA>
6: 32 <NA>
7: 19 Young Adults
8: 53 <NA>
9: 28 Adults
10: 29 Adults
答案 0 :(得分:2)
使用data.table
中的非等分联接的解决方案。
首先,获取每组的最小和最大年龄,从描述中提取
library(dplyr)
library(stringr)
#get minimum and maximum age grom group
dfAgebands <- dfAgeBands %>% mutate( minAge = stringr::str_extract( ageCriterion, "(?<=\\> )[0-9]+(?= &)") %>% as.numeric(),
maxAge = stringr::str_extract( ageCriterion, "(?<=\\<= )[0-9]+(?=$)") %>% as.numeric() )
ageCriterion ageBand minAge maxAge 1 age > 0 & age <= 20 Young Adults 0 20 2 age > 20 & age <= 30 Adults 20 30
现在,您可以轻松地执行非等额联接
library(data.table)
dfOriginal[ dfAgebands, ageBand := i.ageBand, on = c("age > minAge", "age <= maxAge")]
# age ageBand
# 1: 55 <NA>
# 2: 40 <NA>
# 3: 41 <NA>
# 4: 33 <NA>
# 5: 56 <NA>
# 6: 25 Adults
# 7: 11 Young Adults
# 8: 13 Young Adults
# 9: 28 Adults
# 10: 27 Adults
答案 1 :(得分:1)
最好不要通常使用eval(parse
,但此处的表达式很容易使用它。一种选择是通过循环遍历“ ageCriterion”的每个元素来eval
对i
中的表达式求u,并将“ ageBand”值赋给:=
来满足{ {1}}
i
或使用library(data.table)
for(i in seq_len(nrow(dfAgeBands))) {
dfOriginal[eval(parse(text = dfAgeBands$ageCriterion[i])),
ageBand := dfAgeBands$ageBand[i]]
}
dfOriginal[]
purrr
答案 2 :(得分:1)
对于它的价值---即我的解决方案,除了akrun等巨人的解决方案以及Wimpel等其他天才的解决方案---这是map2的解决方案:
map2(ageBands$AgeCriteria, ageBands$AgeBand,
function(x,y){df1[eval(parse_expr(x)), ageBands := y]})