根据条件填写缺失值

时间:2018-04-30 11:02:14

标签: r missing-data

我正在研究一个纵向数据集,其中同一年有多个数据,但有时它缺失了。所以,使用这个数据:

id <- c(rep("1", 5), rep("2", 5), rep("3", 5))
year <- c(1999, 1999, 2000, 2001, 2001, 1999, 2000, 2001, 2001, 2001, 1999, 2000, 
2001, 2002, 2003)
marstat <- c("married", NA, "married", "married", "divorced", "single", "single", "single", NA, NA, "married", NA, "married", "divorced", "divorced")
df <- data.frame(id , year , marstat)

   id year  marstat
1   1 1999  married
2   1 1999     NA
3   1 2000  married
4   1 2001  married
5   1 2001 divorced
6   2 1999   single
7   2 2000   single
8   2 2001   single
9   2 2001     NA
10  2 2001     NA
11  3 1999  married
12  3 2000     NA
13  3 2001  married
14  3 2002 divorced
15  3 2003 divorced

如果有关于该年度婚姻状况的信息,我想向NAs填写该人的现有数据。因此对于ID 1,第2行有一个NA,但同一年有该人的数据,所以我希望它在那里说“结婚”。类似地,对于ID,第9行和第10行,它应该说“单一”,因为根据第8行的数据,该人在2001年是单身。

我不只是想删除缺少的行,因为在我的实际数据中我有更多的列。

我不想根据之前/之后的值填写它。只有在年份相同的情况下。

1 个答案:

答案 0 :(得分:0)

你可以尝试

library(tidyverse)
df %>% 
  group_by(id, year) %>% 
  mutate(marstat2=paste(na.omit(marstat), collapse = ","),
         marstat3=case_when(is.na(marstat) ~  marstat2, 
                            TRUE ~ as.character(marstat)))
# A tibble: 15 x 5
# Groups:   id, year [11]
   id     year marstat  marstat2         marstat3
   <fct> <dbl> <fct>    <chr>            <chr>   
 1 1     1999. married  married          married 
 2 1     1999. NA       married          married 
 3 1     2000. married  married          married 
 4 1     2001. married  married,divorced married 
 5 1     2001. divorced married,divorced divorced
 6 2     1999. single   single           single  
 7 2     2000. single   single           single  
 8 2     2001. single   single           single  
 9 2     2001. NA       single           single  
10 2     2001. NA       single           single  
11 3     1999. married  married          married 
12 3     2000. NA       ""               ""      
13 3     2001. married  married          married 
14 3     2002. divorced divorced         divorced
15 3     2003. divorced divorced         divorced

添加了不同的列以显示该方法的优势。