我有一个数据框,其中包含由多个编码器生成的一组项目评级。并非所有编码员都对所有项目进对于每个项目,我想基于前两位编码员的评级产生平均值,如外部排名系统所示。编码器从A(最高)到D(最低)排名。在我当前的代码中,我通过编码器排名(从A到D)对列进行排序,然后使用for循环:
CoderA CoderB CoderC CoderD
1 2 1 NA 1
2 1 3 3 NA
3 NA NA 4 5
4 7 6 7 6
5 3 3 4 2
6 2 2 NA NA
7 2 NA 2 1
8 5 3 NA 4
9 7 7 6 NA
10 1 NA 3 4
df <- data.frame(
CoderA = c(2,1,NA,7,3,2,2,5,7,1),
CoderB = c(1,3,NA,6,3,2,NA,3,7,NA),
CoderC = c(NA,3,4,7,4,NA,2,NA,6,3),
CoderD = c(1,NA,5,6,2,NA,1,4,NA,4))
df$first_sc <- apply(df, 1, function(x) names(df[which(!is.na(x))])[1])
df$sec_sc <- apply(df, 1, function(x) names(df[which(!is.na(x))])[2])
for (x in seq(1,nrow(df))) {
first_rating <- df[x,df$first_sc[x]]
second_rating <- df[x,df$sec_sc[x]]
df$BestAvg[x] <- (first_rating + second_rating) / 2
}
问题1:有关上述简单案例的更简约解决方案的任何建议吗? (for循环不是首选,但我坚持使用类似的apply函数进行索引。)
问题2:在第二个数据框中,列是不按编码器排名排序(例如,列是'CoderD','CoderB','CoderC', 'CoderA')。考虑到这种约束,我怎么能处理同样的问题?
答案 0 :(得分:1)
使用dplyr
和tidyr
...
df2 <- df %>% mutate(case=1:n()) %>% #add case numbers
gather(key=coder,value=score,-case) %>% #convert to long format
filter(!is.na(score)) %>% #remove NA scores
arrange(case,coder) %>% #order by case and coder
group_by(case) %>% #group by case
summarise(bestavg=mean(head(score,2))) %>% #mean of top two
right_join(df %>% mutate(case=1:n())) #merge with original data
df2
# A tibble: 10 x 6
case bestavg CoderA CoderB CoderC CoderD
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1.5 2 1 NA 1
2 2 2.0 1 3 3 NA
3 3 4.5 NA NA 4 5
4 4 6.5 7 6 7 6
5 5 3.0 3 3 4 2
6 6 2.0 2 2 NA NA
7 7 2.0 2 NA 2 1
8 8 4.0 5 3 NA 4
9 9 7.0 7 7 6 NA
10 10 2.0 1 NA 3 4
如果你的编码器名称按你想要的偏好顺序排序(如你所描述的),这将有效。数据框中列的顺序无关紧要。
答案 1 :(得分:1)
对于第一个问题,
您可以使用apply
得到每行前2个非NA值的平均值:
df$BestAvg = apply(df,1,function(x) mean(x[!is.na(x)][1:2]))
如果编码员的排名实际上是CoderD > CoderB > CoderC > CoderA
:
r = c("CoderD", "CoderB", "CoderC", "CoderA")
df$BestAvg2 = apply(df,1,function(x) mean(x[r][!is.na(x[r])][1:2]))
返回:
CoderA CoderB CoderC CoderD BestAvg BestAvg2
1 2 1 NA 1 1.5 1.0
2 1 3 3 NA 2.0 3.0
3 NA NA 4 5 4.5 4.5
4 7 6 7 6 6.5 6.0
5 3 3 4 2 3.0 2.5
6 2 2 NA NA 2.0 2.0
7 2 NA 2 1 2.0 1.5
8 5 3 NA 4 4.0 3.5
9 7 7 6 NA 7.0 6.5
10 1 NA 3 4 2.0 3.5