从指示范围的列中替换以年份命名的有序列中的值

时间:2018-09-07 05:25:15

标签: r dplyr data-munging

我想使用'StartYear'和'CloseYear'指定的范围重新编码几年的列。

从这里获取信息的一种优雅方式是

library(tibble); library(dplyr)

(df <- tibble(id = c(1,2,3, 4),
              `1997` = c(1,0,0, 1), 
              `1998` = c(0,1,0, 0), 
              `1999` = c(0,0,1, 0),
              `2000` = c(0, 0, 1, 1),
              StartYear = c(1998, 1997, 1998, 1998),
              CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#>      id `1997` `1998` `1999` `2000` StartYear CloseYear
#>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>     <dbl>     <dbl>
#> 1     1      1      0      0      0      1998      1999
#> 2     2      0      1      0      0      1997      1997
#> 3     3      0      0      1      1      1998      2000
#> 4     4      1      0      0      1      1998      1999

到这里:

(tibble(id = c(1,2,3, 4),
              `1997` = c(0, 1, 0, 0), 
              `1998` = c(1, 0, 1, 1), 
              `1999` = c(1, 0, 1, 1),
              `2000` = c(0, 0, 1, 0),
              StartYear = c(1998, 1997, 1998, 1998),
              CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#>      id `1997` `1998` `1999` `2000` StartYear CloseYear
#>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>     <dbl>     <dbl>
#> 1     1      0      1      1      0      1998      1999
#> 2     2      1      0      0      0      1997      1997
#> 3     3      0      1      1      1      1998      2000
#> 4     4      0      1      1      0      1998      1999

使用dplyr / dplyr::mutate函数有一种不错的方法吗?

2 个答案:

答案 0 :(得分:3)

一种可能的整理方法。聚集,变异,然后传播……

library(tidyverse)
df %>% 
  gather(year, value, -id, -StartYear, -CloseYear, convert = TRUE) %>%
  mutate(value = as.integer(StartYear <= year & year <= CloseYear)) %>% 
  spread(year, value)
#> # A tibble: 4 x 7
#>      id StartYear CloseYear `1997` `1998` `1999` `2000`
#>   <dbl>     <dbl>     <dbl>  <int>  <int>  <int>  <int>
#> 1     1      1998      1999      0      1      1      0
#> 2     2      1997      1997      1      0      0      0
#> 3     3      1998      2000      0      1      1      1
#> 4     4      1998      1999      0      1      1      0

答案 1 :(得分:0)

如果您也对data.table开放:

library(data.table)
dcast(
    setDT(df)[, .(StartYear, CloseYear, flag = seq(StartYear, CloseYear)), by = .(id)],
    id + StartYear + CloseYear ~ flag, fun.agg = length)

#    id StartYear CloseYear 1997 1998 1999 2000
# 1:  1      1998      1999    0    1    1    0
# 2:  2      1997      1997    1    0    0    0
# 3:  3      1998      2000    0    1    1    1
# 4:  4      1998      1999    0    1    1    0