示例数据:
x <- data.frame(id=c(1,1,1,2,2,7,7,7,7),dna=c(232,424,5345,45345,45,345,4543,345345,4545))
y <- data.frame(id=c(1,1,1,2,2,7,7,7),year=c(2001,2002,2003,2005,2006,2000,2001,2002))
x <- transform(x, rec = ave(id, id, FUN = seq_along))
y <- transform(y, rec = ave(id, id, FUN = seq_along))
df <- merge(x, y, c("id", "rec"))
df
我想将dna
的列值替换为NA
,但给定id
和rec
的最后一行除外。我怎样才能有效地做到这一点?理想的是基地R的解决方案。谢谢!
所需的输出:
id rec dna year
1 1 1 NA 2001
2 1 2 NA 2002
3 1 3 5345 2003
4 2 1 NA 2005
5 2 2 45 2006
...
...
答案 0 :(得分:3)
试试这个:
df$dna <- with(df, ave(dna, df$id, FUN = function(x){
if ((len <- length(x)) > 1)
x[1:(len-1)] <- NA
x
}))
df
# id rec dna year
# 1 1 1 NA 2001
# 2 1 2 NA 2002
# 3 1 3 5345 2003
# 4 2 1 NA 2005
# 5 2 2 45 2006
# 6 7 1 NA 2000
# 7 7 2 NA 2001
# 8 7 3 345345 2002
答案 1 :(得分:2)
虽然你问过基础R解决方案,但这里有一个data.table
解决方案(以防效率问题)
library(data.table)
setDT(df)[, indx := .N, by = id][rec != indx, dna := NA_real_, by = id]
# id rec dna year indx
# 1: 1 1 NA 2001 3
# 2: 1 2 NA 2002 3
# 3: 1 3 5345 2003 3
# 4: 2 1 NA 2005 2
# 5: 2 2 45 2006 2
# 6: 7 1 NA 2000 3
# 7: 7 2 NA 2001 3
# 8: 7 3 345345 2002 3
答案 2 :(得分:2)
另一种方法:
transform(df, dna = ave(dna, id, FUN = function(x) "is.na<-"(x, -length(x))))
# id rec dna year
# 1 1 1 NA 2001
# 2 1 2 NA 2002
# 3 1 3 5345 2003
# 4 2 1 NA 2005
# 5 2 2 45 2006
# 6 7 1 NA 2000
# 7 7 2 NA 2001
# 8 7 3 345345 2002
答案 3 :(得分:1)
在id
列上,您可以使用duplicated
函数及其fromLast
参数。然后,我们可以将其包含在dna
列的向量子集中,并将NA
值分配给结果。
> df$dna[duplicated(df$id, fromLast = TRUE)] <- NA
> df
# id rec dna year
# 1 1 1 NA 2001
# 2 1 2 NA 2002
# 3 1 3 5345 2003
# 4 2 1 NA 2005
# 5 2 2 45 2006
# 6 7 1 NA 2000
# 7 7 2 NA 2001
# 8 7 3 345345 2002