如何在数据框列表中使用purrr :: map来修改特定数据框中的列值,而不更改列表中的其他数据框?

时间:2019-08-25 00:03:18

标签: r dplyr purrr

  • 我有许多数据帧的列表(survey08survey09survey10等)称为df_list

  • 每个数据框包含2列,分别名为yearemployed

# create 3 dataframes with identical column names
survey08 <- data.frame(year = 2008, employed = c(1, 2, 2, 1, 2))
survey09 <- data.frame(year = 2009, employed = c(1, 1, 1, 2, 1))
survey10 <- data.frame(year = 2010, employed = c(2, 1, 1, 1, 1))

# put dataframes into a list
df_list <- list(survey08, survey09, survey10)

# add names for dataframes in list
# names correspond to survey year ('year' column)
names(df_list) <- c("survey08", "survey09", "survey10")

我想重新编码employed列中的值(1 =是,2 =否),但仅对survey08survey09数据帧中的值进行编码。对于列表中的其他数据框,我想保留原始列值(即,仅修改列表中的特定DF)。

我使用year列作为过滤器尝试了以下代码:

library(tidyverse)

# modify only values in 'employed' column for DFs 'survey08' and 'survey09' 
# use 'year' column as filter

df_list %>% 
  map(~filter(.x, year %in% 2008:2009)) %>% 
  map(~ .x %>% mutate_at(vars(employed), ~recode_factor(.,`1` = "yes", `2` = "no")))

尽管这正确地重新编码了两个数据帧(survey08survey09),但它并未保留列表中其他数据帧的值。

当前输出:

#> $survey08
#>   year employed
#> 1 2008      yes
#> 2 2008       no
#> 3 2008       no
#> 4 2008      yes
#> 5 2008       no
#> 
#> $survey09
#>   year employed
#> 1 2009      yes
#> 2 2009      yes
#> 3 2009      yes
#> 4 2009       no
#> 5 2009      yes
#> 
#> $survey10
#> [1] year     employed
#> <0 rows> (or 0-length row.names)

所需的输出:

$survey08
  year employed
1 2008      yes
2 2008       no
3 2008       no
4 2008      yes
5 2008       no

$survey09
  year employed
1 2009      yes
2 2009      yes
3 2009      yes
4 2009       no
5 2009      yes

$survey10
  year employed
1 2010      2
2 2010      1
3 2010      1
4 2010      1
5 2010      1

reprex package(v0.3.0)于2019-08-24创建

4 个答案:

答案 0 :(得分:2)

您可以使用68 来仅修改由名称或位置指定的元素。

purrr::map_at

答案 1 :(得分:1)

使用filter将删除您要保留的其他data.frame。您需要map_if而不是map。然后,您可以使用.p来标识要执行地图功能的项目。

df_list %>% 
   map_if(., 
      .f = ~ .x %>% mutate_at(vars(employed), ~recode_factor(.,`1` = "yes", `2` = "no")), 
      .p = c(T,T,F))

df_list %>% 
   map_if(., 
       .f = ~ .x %>% mutate_at(vars(employed), ~recode_factor(.,`1` = "yes", `2` = "no")), 
       .p = ~ .x %>% pull(year) %>% unique(.) %in% 2008:2009)

答案 2 :(得分:1)

使用lapply和用户定义函数来评估year是否小于2010的基本R解决方案。

df_list2 <- lapply(df_list, function(x){
  if (unique(x$year) < 2010){
    x$employed <- as.character(factor(x$employed, levels = c(1, 2), labels = c("yes", "no")))
  }
  return(x)
})

df_list2
# $survey08
#   year employed
# 1 2008      yes
# 2 2008       no
# 3 2008       no
# 4 2008      yes
# 5 2008       no
# 
# $survey09
#   year employed
# 1 2009      yes
# 2 2009      yes
# 3 2009      yes
# 4 2009       no
# 5 2009      yes
# 
# $survey10
#   year employed
# 1 2010        2
# 2 2010        1
# 3 2010        1
# 4 2010        1
# 5 2010        1

答案 3 :(得分:0)

如果您已经知道要执行哪个列表,为什么不只将其子集并重新编码。

library(tidyverse)

df_list[c("survey08", "survey09")] <- df_list[c("survey08", "survey09")] %>%
  map(~ .x %>% mutate_at(vars(employed), ~recode_factor(.,`1` = "yes", `2` = "no")))


df_list
#$survey08
#  year employed
#1 2008      yes
#2 2008       no
#3 2008       no
#4 2008      yes
#5 2008       no

#$survey09
#  year employed
#1 2009      yes
#2 2009      yes
#3 2009      yes
#4 2009       no
#5 2009      yes

#$survey10
#  year employed
#1 2010        2
#2 2010        1
#3 2010        1
#4 2010        1
#5 2010        1