我最后一次询问关于在R中按组填充每一行的相同值的问题,现在我处理完全相同的问题但是有一些缺失值NA。这是数据,空白""意味着人没有在那个窗口暴露,NA视为缺失,第1意味着在第一个窗口暴露的人..
ID <- c(1,1,2,2,2,3,3,4,4,4)
x <- c("1st","","1st","1st","","",NA,"1st",NA,"1st")
y <- c("2nd","2nd","","","","2nd","2nd","","",NA)
z <- c("","","3rd","3rd","",NA,"3rd","",NA,"")
m <- c(10:19)
n <- c(20:29)
df <- data.frame(ID,x,y,z,m,n)
library(data.table)
setDT(df)[, c("x1", "y1", "z1") := lapply(.SD, function(x) x[which.max(x != "")]), by = ID]
我得到了输出,它几乎是我想要的那个,除了NA
ID x y z m n x1 y1 z1
1: 1 1st 2nd 10 20 1st 2nd
2: 1 2nd 11 21 1st 2nd
3: 2 1st 3rd 12 22 1st 3rd
4: 2 1st 3rd 13 23 1st 3rd
5: 2 14 24 1st 3rd
6: 3 2nd NA 15 25 2nd 3rd
7: 3 NA 2nd 3rd 16 26 2nd 3rd
8: 4 1st 17 27 1st
9: 4 NA NA 18 28 1st
10: 4 1st NA 19 29 1st
你可以看到第6行和第7行,ID是3,它应该填充x1 = NA,第8,9,10行,ID是4,y1和z1将是NA,这是输出我想要的
ID x y z m n x1 y1 z1
1: 1 1st 2nd 10 20 1st 2nd
2: 1 2nd 11 21 1st 2nd
3: 2 1st 3rd 12 22 1st 3rd
4: 2 1st 3rd 13 23 1st 3rd
5: 2 14 24 1st 3rd
6: 3 2nd NA 15 25 NA 2nd 3rd
7: 3 NA 2nd 3rd 16 26 NA 2nd 3rd
8: 4 1st 17 27 1st NA NA
9: 4 NA NA 18 28 1st NA NA
10: 4 1st NA 19 29 1st NA NA
答案 0 :(得分:4)
只需改变:
x[which.max(x != "")]
为:
x[!x %in% c("", NA)][1L]
答案 1 :(得分:1)
如何将NA的重新编码条件设置为0.5,它将NA优先于空字符串但小于其他字符串:
df[, c("x1", "y1", "z1") := lapply(.SD, function(x) x[which.max(ifelse(is.na(x), 0.5, x != ""))]), by = ID]
df
# ID x y z m n x1 y1 z1
# 1: 1 1st 2nd 10 20 1st 2nd
# 2: 1 2nd 11 21 1st 2nd
# 3: 2 1st 3rd 12 22 1st 3rd
# 4: 2 1st 3rd 13 23 1st 3rd
# 5: 2 14 24 1st 3rd
# 6: 3 2nd NA 15 25 NA 2nd 3rd
# 7: 3 NA 2nd 3rd 16 26 NA 2nd 3rd
# 8: 4 1st 17 27 1st NA NA
# 9: 4 NA NA 18 28 1st NA NA
#10: 4 1st NA 19 29 1st NA NA