我想将包含起始年和年终变量的数据框转换为完整的时间序列,其中(1)包括起始年和年终之间的所有年份以及(2)填写中间年份的所有变量的值。
这就是原始数据的样子:
data_original <- data.frame(name = c("peter", "peter", "eric", "denisse"), lastname = c("smith", "smith", "jordan", "williams"), age = c(54, 54, 48, 40), start_year = c(1980,1986, 1990, 2000), end_year = c(1984, 1988, 1993, 2001))
data_original
#> name lastname age start_year end_year
#> 1 peter smith 54 1980 1984
#> 2 peter smith 54 1986 1988
#> 3 eric jordan 48 1990 1993
#> 4 denisse williams 40 2000 2001
这就是我希望数据的样子:
data_final <- data.frame(name = c("peter", "peter", "peter", "peter", "peter", "peter", "peter", "peter", "eric", "eric", "eric", "eric", "denisse", "denisse"), lastname = c("smith", "smith", "smith", "smith", "smith", "smith", "smith", "smith", "jordan", "jordan", "jordan", "jordan", "williams", "williams"), age = c(54, 54, 54, 54, 54, 54, 54, 54, 48, 48, 48, 48, 40, 40), year = c(1980, 1981, 1982, 1983, 1984, 1986, 1987, 1988, 1990, 1991, 1992, 1993, 2000, 2001))
data_final
#> name lastname age year
#> 1 peter smith 54 1980
#> 2 peter smith 54 1981
#> 3 peter smith 54 1982
#> 4 peter smith 54 1983
#> 5 peter smith 54 1984
#> 6 peter smith 54 1986
#> 7 peter smith 54 1987
#> 8 peter smith 54 1988
#> 9 eric jordan 48 1990
#> 10 eric jordan 48 1991
#> 11 eric jordan 48 1992
#> 12 eric jordan 48 1993
#> 13 denisse williams 40 2000
#> 14 denisse williams 40 2001
非常感谢此事以及您的持续帮助!
答案 0 :(得分:3)
以下是tidyverse
的一个选项。创造年份&#39;通过获取一系列的“start_year”&#39; end_year&#39;使用map2
,select
相关列和unnest
library(tidyverse)
data_original %>%
mutate(year = map2(start_year, end_year, `:`)) %>%
select(-start_year, -end_year) %>%
unnest
# name lastname age year
#1 peter smith 54 1980
#2 peter smith 54 1981
#3 peter smith 54 1982
#4 peter smith 54 1983
#5 peter smith 54 1984
#6 peter smith 54 1986
#7 peter smith 54 1987
#8 peter smith 54 1988
#9 eric jordan 48 1990
#10 eric jordan 48 1991
#11 eric jordan 48 1992
#12 eric jordan 48 1993
#13 denisse williams 40 2000
#14 denisse williams 40 2001
或另一个选项是data.table
library(data.table)
setDT(data_original)[, .(name, lastname, year = seq(start_year, end_year, by = 1)),
.(grp = 1:nrow(data_original))][, grp := NULL][]
或者我们也可以base R
使用Map
lst <- do.call(Map, c(f = `:`, data_original[4:5]))
out <- data_original[1:3][rep(seq_len(nrow(data_original)), lengths(lst)),]
row.names(out) <- NULL
答案 1 :(得分:2)
以下是使用tidyverse
和seq
的另一种unnest
方法:
data_original %>%
rowwise() %>%
mutate(year = list(seq(start_year, end_year, 1))) %>%
ungroup() %>%
select(-start_year, -end_year) %>%
unnest()
## A tibble: 14 x 4
# name lastname age year
# <fct> <fct> <dbl> <dbl>
# 1 peter smith 54. 1980.
# 2 peter smith 54. 1981.
# 3 peter smith 54. 1982.
# 4 peter smith 54. 1983.
# 5 peter smith 54. 1984.
# 6 peter smith 54. 1986.
# 7 peter smith 54. 1987.
# 8 peter smith 54. 1988.
# 9 eric jordan 48. 1990.
#10 eric jordan 48. 1991.
#11 eric jordan 48. 1992.
#12 eric jordan 48. 1993.
#13 denisse williams 40. 2000.
#14 denisse williams 40. 2001.
PS。事后看来,@ akrun使用purrr::map2
的方法更清晰;它节省了按行进行显式(非)分组的需要。