在R

时间:2016-08-19 21:35:23

标签: r dplyr data.table

我最后一次询问关于在R中按组填充每一行的相同值的问题,现在我处理完全相同的问题但是有一些缺失值NA。这是数据,空白""意味着人没有在那个窗口暴露,NA视为缺失,第1意味着在第一个窗口暴露的人..

ID <- c(1,1,2,2,2,3,3,4,4,4)
x <- c("1st","","1st","1st","","",NA,"1st",NA,"1st")
y <- c("2nd","2nd","","","","2nd","2nd","","",NA)
z <- c("","","3rd","3rd","",NA,"3rd","",NA,"")
m <- c(10:19)
n <- c(20:29)
df <- data.frame(ID,x,y,z,m,n)
library(data.table)
setDT(df)[, c("x1", "y1", "z1") := lapply(.SD, function(x) x[which.max(x !=   "")]), by = ID]

我得到了输出,它几乎是我想要的那个,除了NA

    ID   x   y   z  m  n  x1  y1  z1
 1:  1 1st 2nd     10 20 1st 2nd    
 2:  1     2nd     11 21 1st 2nd    
 3:  2 1st     3rd 12 22 1st     3rd
 4:  2 1st     3rd 13 23 1st     3rd
 5:  2             14 24 1st     3rd
 6:  3     2nd  NA 15 25     2nd 3rd
 7:  3  NA 2nd 3rd 16 26     2nd 3rd
 8:  4 1st         17 27 1st        
 9:  4  NA      NA 18 28 1st        
10:  4 1st  NA     19 29 1st 

你可以看到第6行和第7行,ID是3,它应该填充x1 = NA,第8,9,10行,ID是4,y1和z1将是NA,这是输出我想要的

    ID   x   y   z  m  n  x1  y1  z1
 1:  1 1st 2nd     10 20 1st 2nd    
 2:  1     2nd     11 21 1st 2nd    
 3:  2 1st     3rd 12 22 1st     3rd
 4:  2 1st     3rd 13 23 1st     3rd
 5:  2             14 24 1st     3rd
 6:  3     2nd  NA 15 25 NA   2nd 3rd
 7:  3  NA 2nd 3rd 16 26 NA   2nd 3rd
 8:  4 1st         17 27 1st  NA  NA     
 9:  4  NA      NA 18 28 1st  NA  NA     
10:  4 1st  NA     19 29 1st  NA  NA

2 个答案:

答案 0 :(得分:4)

只需改变:

x[which.max(x != "")]

为:

x[!x %in% c("", NA)][1L]

答案 1 :(得分:1)

如何将NA的重新编码条件设置为0.5,它将NA优先于空字符串但小于其他字符串:

df[, c("x1", "y1", "z1") := lapply(.SD, function(x) x[which.max(ifelse(is.na(x), 0.5, x != ""))]), by = ID]

df
#    ID   x   y   z  m  n  x1  y1  z1
# 1:  1 1st 2nd     10 20 1st 2nd    
# 2:  1     2nd     11 21 1st 2nd    
# 3:  2 1st     3rd 12 22 1st     3rd
# 4:  2 1st     3rd 13 23 1st     3rd
# 5:  2             14 24 1st     3rd
# 6:  3     2nd  NA 15 25  NA 2nd 3rd
# 7:  3  NA 2nd 3rd 16 26  NA 2nd 3rd
# 8:  4 1st         17 27 1st  NA  NA
# 9:  4  NA      NA 18 28 1st  NA  NA
#10:  4 1st  NA     19 29 1st  NA  NA