我现在有这张表,如下所示:
Worker | Score
A | 10
A | 20
A | 0
A | 0
A | 0
B | 2
B | 4
B | 0
B | 6
现在我的一些分数不可用我已用0填充它们。在R上有一种方法我可以用特定工人分数的平均值替换那些0值。决赛桌应如下所示:
Worker | Score
A | 10
A | 20
A | 15 (mean of other scores)
A | 15 (mean of other scores)
A | 15 (mean of other scores)
B | 2
B | 4
B | 4 (mean of other scores)
B | 6
现在我正在考虑循环,但我有成千上万的条目会使它变得非常缓慢和低效。
答案 0 :(得分:1)
使用ave
查找每个Worker
的平均值,然后使用replace
替换相关值
replace(x = df$Score, list = df$Score == 0, values =
ave(df$Score, df$Worker, FUN = function(x) sum(x, na.rm = TRUE)/sum(x!=0))[df$Score == 0])
#[1] 10 20 15 15 15 2 4 4 6
数据强>
df = structure(list(Worker = c("A", "A", "A", "A", "A", "B", "B",
"B", "B"), Score = c(10L, 20L, 0L, 0L, 0L, 2L, 4L, 0L, 6L)), .Names = c("Worker",
"Score"), class = "data.frame", row.names = c(NA, -9L))
答案 1 :(得分:0)
一个选项是来自na.aggregate
的{{1}}。替换'得分'中的0值按NA,按工人'分组,将base R
应用于'分数'取代' NA'根据“得分”的na.aggregate
进行评分。将其分配给'得分'
mean
或者它可以通过
变得更紧凑library(data.table)
library(zoo)
setDT(df1)[Score ==0, Score := NA ][, .(Score = na.aggregate(Score)), by = Worker]
# Worker Score
#1: A 10
#2: A 20
#3: A 15
#4: A 15
#5: A 15
#6: B 2
#7: B 4
#8: B 4
#9: B 6
setDT(df1)[, .(Score = na.aggregate(Score*NA^!Score)), Worker]
答案 2 :(得分:0)
以下是data.table
library("data.table")
df1 <- data.table(Worker = c("A", "A", "A", "A", "A", "B", "B", "B", "B"),
Score = c(10L, 20L, 0L, 0L, 0L, 2L, 4L, 0L, 6L))
m <- df1[Score!=0, mean(Score), Worker]
m[df1, on="Worker"][, `:=`(Score=ifelse(Score==0, V1, Score), V1=NULL)][]