我需要重塑df,在缺少年份的情况下完成它,并创建可变的跟踪状态更改。问题是缺少某些值,我编写的代码使这些值制止了。
玩具示例:
library(data.table)
df <- data.frame(id=c(1,2),phase_1=c(1994,1994),phase_2=c(1996,1996),phase_3=c(1997,NA))
df1 = melt(df,
id.vars = "id",
measure.vars = c("phase_1", "phase_2", "phase_3"),
variable.name = "status",
value.name = "year",
na.rm = FALSE)
df2 <- df1 %>% complete(id, year = full_seq(year, 1)) %>%
fill(status)
所需
id year phase change
1 1 1994 phase_1 0
2 1 1995 phase_1 0
3 1 1996 phase_2 1
4 1 1997 phase_3 1
5 2 1994 phase_1 0
6 2 1995 phase_1 0
7 2 1996 phase_2 1
8 2 1997 phase_2 0
答案 0 :(得分:2)
使用dplyr
和tidyr
,您还可以执行以下操作:
df %>%
gather(phase, year, -id, na.rm = TRUE) %>%
complete(id, year = full_seq(year, 1)) %>%
fill(phase) %>%
group_by(id) %>%
mutate(change = as.numeric(phase != lag(phase, default = first(phase))))
id year phase change
<dbl> <dbl> <chr> <dbl>
1 1 1994 phase_1 0
2 1 1995 phase_1 0
3 1 1996 phase_2 1
4 1 1997 phase_3 1
5 2 1994 phase_1 0
6 2 1995 phase_1 0
7 2 1996 phase_2 1
8 2 1997 phase_2 0
或者:
df %>%
gather(phase, year, -id, na.rm = TRUE) %>%
complete(id, year = full_seq(year, 1)) %>%
fill(phase) %>%
group_by(id) %>%
mutate(change = (phase != lag(phase, default = first(phase))) * 1)
答案 1 :(得分:1)
您可以将dplyr
和tidyr
用作:
library(dplyr)
library(tidyr)
df %>%
gather(phase, year, phase_1:phase_3) %>%
filter(!is.na(year)) %>%
complete(id, year = full_seq(year, 1)) %>%
mutate(phase = ifelse(is.na(phase), lag(phase,1), phase)) %>%
group_by(id) %>%
mutate(change = ifelse(phase == lag(phase, 1) | row_number() == 1, 0, 1))
# A tibble: 8 x 4
# Groups: id [2]
id year phase change
<dbl> <dbl> <chr> <dbl>
1 1 1994 phase_1 0
2 1 1995 phase_1 0
3 1 1996 phase_2 1
4 1 1997 phase_3 1
5 2 1994 phase_1 0
6 2 1995 phase_1 0
7 2 1996 phase_2 1
8 2 1997 phase_2 0