识别最高的`字母` - 忽略纵向数据集中“ID”内的B -

时间:2013-03-29 20:27:36

标签: r

我正在尝试识别纵向数据集中score内的最高ID

说我的数据看起来像这样,

dfL <- data.frame(ID = c(1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 9L, 9L, 9L, 9L, 9L, 10L), week = c("baseline", 4L, 6L, "baseline", 6L, 9L, 9L, 12L, "baseline", 4L, 6L, 9L, 12L, "baseline"), score = c(NA, "A", "B", NA, "B", "E", "D", "C", NA, "B", "A", "A", "B", NA)); dfL
   ID     week score
1   1 baseline  <NA>
2   1        4     A
3   1        6     B
4   4 baseline  <NA>
5   4        6     B
6   4        9     E
7   4        9     D
8   4       12     C
9   9 baseline  <NA>
10  9        4     B
11  9        6     A
12  9        9     A
13  9       12     B
14 10 baseline  <NA>

我要做的是找到最高分,用字母表示,忽略B,并将这封信放在每个baseline的{​​{1}}上。设计结果如下所示,

ID

对于知道如何解决这个问题的人,你能推荐一些有很好的教程来学习如何操纵纵向数据的书籍或网页吗?

2 个答案:

答案 0 :(得分:2)

这是一个快速解决方案。

dfL <- data.frame(ID = c(1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 9L, 9L, 9L, 9L, 9L, 10L), week = c("baseline", 4L, 6L, "baseline", 6L, 9L, 9L, 12L, "baseline", 4L, 6L, 9L, 12L, "baseline"), score = c(NA, "A", "B", NA, "B", "E", "D", "C", NA, "B", "A", "A", "B", NA));

#find the highest score per id excluding "B"
highestScore = by(dfL$score, dfL$ID, function(ids){ 
    head(rev(sort(ids[ids != "B"])), 1) 
})

dfL$hi_score = NA
for (id in names(highestScore)){
    newWeek = as.character(highestScore[[id]])
    #to account for weeks with no scores
    newWeek = ifelse(length(newWeek)==0, NA, newWeek)
    #only update the hi scores at the baseline position  
    dfL[which(dfL$ID == id & dfL$week == "baseline"), "hi_score"] = newWeek
}

dfL

至于教程,一切都与练习有关。阅读本网站上的问题和答案是一个很好的开始。

答案 1 :(得分:1)

我认为这可以胜任。

dfL <- data.frame(ID = c(1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 9L, 9L, 9L, 9L, 9L, 10L), week = c("baseline", 4L, 6L, "baseline", 6L, 9L, 9L, 12L, "baseline", 4L, 6L, 9L, 12L, "baseline"), score = c(NA, "A", "B", NA, "B", "E", "D", "C", NA, "B", "A", "A", "B", NA)); dfL
library(plyr)

dfL$score <- as.character(dfL$score)
dfL$score <- ifelse(dfL$score!="B",dfL$score,NA)
maxdat <- ddply(dfL,.(ID),summarise,hi_score=max(score,na.rm=TRUE))
finaldat <- merge(dfL, maxdat, by="ID")  

如果你真的希望在与基准周不同的行中出现错误,你可以这样做:

finaldat$hi_score<- ifelse(finaldat$week=="baseline", finaldat$hi_score,NA)

如果您想了解有关数据转换的更多信息,您一定要查看Hadley的软件包,例如reshape2 http://had.co.nz/reshape/plyr http://plyr.had.co.nz/