替换为NA,除了给定ID与R的最后一行

时间:2014-08-27 11:48:49

标签: r

示例数据:

x <- data.frame(id=c(1,1,1,2,2,7,7,7,7),dna=c(232,424,5345,45345,45,345,4543,345345,4545))
y <- data.frame(id=c(1,1,1,2,2,7,7,7),year=c(2001,2002,2003,2005,2006,2000,2001,2002))
x <- transform(x, rec = ave(id, id, FUN = seq_along))
y <- transform(y, rec = ave(id, id, FUN = seq_along))

df <- merge(x, y, c("id", "rec"))
df

我想将dna的列值替换为NA,但给定idrec的最后一行除外。我怎样才能有效地做到这一点?理想的是基地R的解决方案。谢谢!

所需的输出:

   id rec   dna year
1  1   1     NA 2001
2  1   2     NA 2002
3  1   3   5345 2003
4  2   1     NA 2005
5  2   2     45 2006
...
...

4 个答案:

答案 0 :(得分:3)

试试这个:

df$dna <- with(df, ave(dna, df$id, FUN = function(x){
  if ((len <- length(x)) > 1) 
    x[1:(len-1)] <- NA 
  x
}))
df
#   id rec    dna year
# 1  1   1     NA 2001
# 2  1   2     NA 2002
# 3  1   3   5345 2003
# 4  2   1     NA 2005
# 5  2   2     45 2006
# 6  7   1     NA 2000
# 7  7   2     NA 2001
# 8  7   3 345345 2002

答案 1 :(得分:2)

虽然你问过基础R解决方案,但这里有一个data.table解决方案(以防效率问题)

library(data.table)
setDT(df)[, indx := .N, by = id][rec != indx, dna := NA_real_, by = id]

#    id rec    dna year indx
# 1:  1   1     NA 2001    3
# 2:  1   2     NA 2002    3
# 3:  1   3   5345 2003    3
# 4:  2   1     NA 2005    2
# 5:  2   2     45 2006    2
# 6:  7   1     NA 2000    3
# 7:  7   2     NA 2001    3
# 8:  7   3 345345 2002    3

答案 2 :(得分:2)

另一种方法:

transform(df, dna = ave(dna, id, FUN = function(x) "is.na<-"(x, -length(x))))

#   id rec    dna year
# 1  1   1     NA 2001
# 2  1   2     NA 2002
# 3  1   3   5345 2003
# 4  2   1     NA 2005
# 5  2   2     45 2006
# 6  7   1     NA 2000
# 7  7   2     NA 2001
# 8  7   3 345345 2002

答案 3 :(得分:1)

id列上,您可以使用duplicated函数及其fromLast参数。然后,我们可以将其包含在dna列的向量子集中,并将NA值分配给结果。

> df$dna[duplicated(df$id, fromLast = TRUE)] <- NA
> df
#   id rec    dna year
# 1  1   1     NA 2001
# 2  1   2     NA 2002
# 3  1   3   5345 2003
# 4  2   1     NA 2005
# 5  2   2     45 2006
# 6  7   1     NA 2000
# 7  7   2     NA 2001
# 8  7   3 345345 2002