使用purrr根据其他数据集应用过滤器和变异

时间:2019-03-18 13:19:28

标签: r dataframe purrr

我试图使用purrr来应用过滤器都基于另一个数据帧的值对变量进行突变。

# This is the original table
set.seed(100)
dfOriginal <- data.table(age = sample(10:60, 10))

# Following is the second data frame containing one variable which 
# I would like to filter by - age criterion
# and then to mutate with - age band
dfAgeBands <- data.table(ageCriterion = c("age > 0 & age <= 20", "age > 20 & age <= 30"),
              ageBand = c("Young Adults", "Adults"))

finalDf <- map2(dfAgeBands$ageCriterion, dfAgeBands$ageBand, function(x,y){dfOriginal[.x, ageBands := .y]})

编辑:刚刚更正了代码(它是为其他数据集构建的!) 但这仍然行不通。

根据ageCriterion数据框中dfAgeBands定义的规则,预期输出将类似于以下内容。

    age      ageBand
 1:  56         <NA>
 2:  51         <NA>
 3:  41         <NA>
 4:  36         <NA>
 5:  44         <NA>
 6:  32         <NA>
 7:  19 Young Adults
 8:  53         <NA>
 9:  28       Adults
10:  29       Adults

3 个答案:

答案 0 :(得分:2)

使用data.table中的非等分联接的解决方案。

首先,获取每组的最小和最大年龄,从描述中提取

library(dplyr)
library(stringr)
#get minimum and maximum age grom group
dfAgebands <- dfAgeBands %>% mutate( minAge = stringr::str_extract( ageCriterion, "(?<=\\> )[0-9]+(?= &)") %>% as.numeric(),
                                     maxAge = stringr::str_extract( ageCriterion, "(?<=\\<= )[0-9]+(?=$)") %>% as.numeric() )
          ageCriterion      ageBand minAge maxAge
1  age > 0 & age <= 20 Young Adults      0     20
2 age > 20 & age <= 30       Adults     20     30

现在,您可以轻松地执行非等额联接

library(data.table)
dfOriginal[ dfAgebands, ageBand := i.ageBand, on = c("age > minAge", "age <= maxAge")]

#     age      ageBand
#  1:  55         <NA>
#  2:  40         <NA>
#  3:  41         <NA>
#  4:  33         <NA>
#  5:  56         <NA>
#  6:  25       Adults
#  7:  11 Young Adults
#  8:  13 Young Adults
#  9:  28       Adults
# 10:  27       Adults

答案 1 :(得分:1)

最好不要通常使用eval(parse,但此处的表达式很容易使用它。一种选择是通过循环遍历“ ageCriterion”的每个元素来evali中的表达式求u,并将“ ageBand”值赋给:=来满足{ {1}}

i

或使用library(data.table) for(i in seq_len(nrow(dfAgeBands))) { dfOriginal[eval(parse(text = dfAgeBands$ageCriterion[i])), ageBand := dfAgeBands$ageBand[i]] } dfOriginal[]

purrr

答案 2 :(得分:1)

对于它的价值---即我的解决方案,除了akrun等巨人的解决方案以及Wimpel等其他天才的解决方案---这是map2的解决方案:

map2(ageBands$AgeCriteria, ageBands$AgeBand, 
          function(x,y){df1[eval(parse_expr(x)), ageBands := y]})