我正在研究一个纵向数据集,其中同一年有多个数据,但有时它缺失了。所以,使用这个数据:
id <- c(rep("1", 5), rep("2", 5), rep("3", 5))
year <- c(1999, 1999, 2000, 2001, 2001, 1999, 2000, 2001, 2001, 2001, 1999, 2000,
2001, 2002, 2003)
marstat <- c("married", NA, "married", "married", "divorced", "single", "single", "single", NA, NA, "married", NA, "married", "divorced", "divorced")
df <- data.frame(id , year , marstat)
id year marstat
1 1 1999 married
2 1 1999 NA
3 1 2000 married
4 1 2001 married
5 1 2001 divorced
6 2 1999 single
7 2 2000 single
8 2 2001 single
9 2 2001 NA
10 2 2001 NA
11 3 1999 married
12 3 2000 NA
13 3 2001 married
14 3 2002 divorced
15 3 2003 divorced
如果有关于该年度婚姻状况的信息,我想向NAs填写该人的现有数据。因此对于ID 1,第2行有一个NA,但同一年有该人的数据,所以我希望它在那里说“结婚”。类似地,对于ID,第9行和第10行,它应该说“单一”,因为根据第8行的数据,该人在2001年是单身。
我不只是想删除缺少的行,因为在我的实际数据中我有更多的列。
我不想根据之前/之后的值填写它。只有在年份相同的情况下。
答案 0 :(得分:0)
你可以尝试
library(tidyverse)
df %>%
group_by(id, year) %>%
mutate(marstat2=paste(na.omit(marstat), collapse = ","),
marstat3=case_when(is.na(marstat) ~ marstat2,
TRUE ~ as.character(marstat)))
# A tibble: 15 x 5
# Groups: id, year [11]
id year marstat marstat2 marstat3
<fct> <dbl> <fct> <chr> <chr>
1 1 1999. married married married
2 1 1999. NA married married
3 1 2000. married married married
4 1 2001. married married,divorced married
5 1 2001. divorced married,divorced divorced
6 2 1999. single single single
7 2 2000. single single single
8 2 2001. single single single
9 2 2001. NA single single
10 2 2001. NA single single
11 3 1999. married married married
12 3 2000. NA "" ""
13 3 2001. married married married
14 3 2002. divorced divorced divorced
15 3 2003. divorced divorced divorced
添加了不同的列以显示该方法的优势。