我有一个问题,我有一些问题需要解决。 我有一个数据框,我在每行中收集了4个标签和相应的分数值。这是我的样本数据:
sample = data.frame("label1" = c("name1", "name1", "name3"), "score1" = c(0.88, 0.5, 0.4),
"label2" = c("name1", "name1", "name3"), "score2" = c(0.93, 0.6, 0.35),
"label3" = c("name2", "name1", "name4"), "score3" = c(0.49, 0.7, 0.8),
"label4" = c("name2", "name2", "name1"), "score4" = c(0.81, 0.8, 0.25), stringsAsFactors = FALSE)
现在我想根据以下规则计算每一行的最终标签和分数:
我想过逐行遍历数据并重构行以使用aggregate
。这是我对第一行的方法:
pairs <- as.data.frame(matrix(as.vector(sample[1,]), ncol=2, byrow = TRUE))
pairs = data.frame("label" = unlist(pairs[,1], recursive = FALSE), "score" = unlist(pairs[,2], recursive = FALSE))
pairs$label = as.character(pairs$label)
aggregate(score~label, data=pairs, FUN = function(x) c(mean = mean(x), count = length(x) ))
在此之后,我不知道如何实施上述规则。任何可能有更有效的方法来解决这个问题? 这是我想要的输出:
result = data.frame("label" = c("name1", "name1", NA), "score" = c(0.905, 0.6, NA))
提前致谢
答案 0 :(得分:1)
就像你一样,我也认为重组数据并聚合它是要走的路,这就是我在这里所做的:
library(dplyr)
sample$row_num <- 1:nrow(sample)
new_lst <- lapply(1:4,
function(x){
cols <- names(sample)[grepl(x, names(sample))]
sample[, c(cols, "row_num")] %>%
setNames(c( "label", "score", "row_num"))
})
sample_2 <- do.call(rbind, new_lst) %>%
group_by(row_num, label) %>%
summarise(cnt = n(),
score_avg = mean(score))
现在我浏览每一行并将if-elseif-else使用的规则应用于代码
lapply(1:nrow(sample),
function(x){
dat <- sample_2 %>% filter(row_num == x)
if(max(dat$cnt) > 2) {
label <- as.character(dat[which((dat$cnt) > 2), "label"])
score <- dat[dat$label == label, "score_avg"]
} else if (nrow(dat) > 2) {
label <- NA
score <- NA
} else {
label <- as.character(dat[which.max(dat$score_avg), "label"])
score <- max(dat$score_avg)
}
return(data.frame(# "row_num" = x, # you can un-comment here to have an indexed output
"label" = label, "score" = score))
}) %>%
data.table::rbindlist()
不是很优雅,但它可以完成工作
希望这会有所帮助