我想使用'StartYear'和'CloseYear'指定的范围重新编码几年的列。
从这里获取信息的一种优雅方式是
library(tibble); library(dplyr)
(df <- tibble(id = c(1,2,3, 4),
`1997` = c(1,0,0, 1),
`1998` = c(0,1,0, 0),
`1999` = c(0,0,1, 0),
`2000` = c(0, 0, 1, 1),
StartYear = c(1998, 1997, 1998, 1998),
CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#> id `1997` `1998` `1999` `2000` StartYear CloseYear
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 0 0 1998 1999
#> 2 2 0 1 0 0 1997 1997
#> 3 3 0 0 1 1 1998 2000
#> 4 4 1 0 0 1 1998 1999
到这里:
(tibble(id = c(1,2,3, 4),
`1997` = c(0, 1, 0, 0),
`1998` = c(1, 0, 1, 1),
`1999` = c(1, 0, 1, 1),
`2000` = c(0, 0, 1, 0),
StartYear = c(1998, 1997, 1998, 1998),
CloseYear = c(1999, 1997, 2000, 1999)))
#> # A tibble: 4 x 7
#> id `1997` `1998` `1999` `2000` StartYear CloseYear
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 1 1 0 1998 1999
#> 2 2 1 0 0 0 1997 1997
#> 3 3 0 1 1 1 1998 2000
#> 4 4 0 1 1 0 1998 1999
使用dplyr
/ dplyr::mutate
函数有一种不错的方法吗?
答案 0 :(得分:3)
一种可能的整理方法。聚集,变异,然后传播……
library(tidyverse)
df %>%
gather(year, value, -id, -StartYear, -CloseYear, convert = TRUE) %>%
mutate(value = as.integer(StartYear <= year & year <= CloseYear)) %>%
spread(year, value)
#> # A tibble: 4 x 7
#> id StartYear CloseYear `1997` `1998` `1999` `2000`
#> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#> 1 1 1998 1999 0 1 1 0
#> 2 2 1997 1997 1 0 0 0
#> 3 3 1998 2000 0 1 1 1
#> 4 4 1998 1999 0 1 1 0
答案 1 :(得分:0)
如果您也对data.table
开放:
library(data.table)
dcast(
setDT(df)[, .(StartYear, CloseYear, flag = seq(StartYear, CloseYear)), by = .(id)],
id + StartYear + CloseYear ~ flag, fun.agg = length)
# id StartYear CloseYear 1997 1998 1999 2000
# 1: 1 1998 1999 0 1 1 0
# 2: 2 1997 1997 1 0 0 0
# 3: 3 1998 2000 0 1 1 1
# 4: 4 1998 1999 0 1 1 0