R用平均值填充零

时间:2017-04-27 20:35:43

标签: r loops for-loop

我现在有这张表,如下所示:

Worker | Score
A      | 10
A      | 20
A      | 0
A      | 0
A      | 0
B      | 2
B      | 4
B      | 0
B      | 6

现在我的一些分数不可用我已用0填充它们。在R上有一种方法我可以用特定工人分数的平均值替换那些0值。决赛桌应如下所示:

Worker | Score
A      | 10
A      | 20
A      | 15 (mean of other scores)
A      | 15 (mean of other scores)
A      | 15 (mean of other scores)
B      | 2
B      | 4
B      | 4 (mean of other scores)
B      | 6

现在我正在考虑循环,但我有成千上万的条目会使它变得非常缓慢和低效。

3 个答案:

答案 0 :(得分:1)

使用ave查找每个Worker的平均值,然后使用replace替换相关值

replace(x = df$Score, list = df$Score == 0, values =
  ave(df$Score, df$Worker, FUN = function(x) sum(x, na.rm = TRUE)/sum(x!=0))[df$Score == 0])
#[1] 10 20 15 15 15  2  4  4  6

数据

df = structure(list(Worker = c("A", "A", "A", "A", "A", "B", "B", 
"B", "B"), Score = c(10L, 20L, 0L, 0L, 0L, 2L, 4L, 0L, 6L)), .Names = c("Worker", 
"Score"), class = "data.frame", row.names = c(NA, -9L))

答案 1 :(得分:0)

一个选项是来自na.aggregate的{​​{1}}。替换'得分'中的0值按NA,按工人'分组,将base R应用于'分数'取代' NA'根据“得分”的na.aggregate进行评分。将其分配给'得分'

mean

或者它可以通过

变得更紧凑
library(data.table)
library(zoo)
setDT(df1)[Score ==0, Score := NA ][, .(Score = na.aggregate(Score)), by = Worker]
#   Worker Score
#1:      A    10
#2:      A    20
#3:      A    15
#4:      A    15
#5:      A    15
#6:      B     2
#7:      B     4
#8:      B     4
#9:      B     6

数据

setDT(df1)[, .(Score = na.aggregate(Score*NA^!Score)), Worker]

答案 2 :(得分:0)

以下是data.table

的另一种解决方案
library("data.table")
df1 <- data.table(Worker = c("A", "A", "A", "A", "A", "B", "B", "B", "B"), 
                  Score = c(10L, 20L, 0L, 0L, 0L, 2L, 4L, 0L, 6L))
m <- df1[Score!=0, mean(Score), Worker]
m[df1, on="Worker"][, `:=`(Score=ifelse(Score==0, V1, Score), V1=NULL)][]