用相应的日均值替换缺失值

时间:2017-12-13 01:17:47

标签: r dataframe replace mean missing-data

我的数据集:MyData

     Day Sales
12-01-17    NA
12-01-17    NA
13-01-17    13
14-01-17    2
12-01-17    33
13-01-17    NA
13-01-17    NA
13-01-17    NA
14-01-17    11
12-01-17    23
13-01-17    21
14-01-17    NA

我想用当天的平均销售额替换缺失的销售一天。因此NA 12-01-2017的平均值为33,23,即28。

我试过的R代码就是这个。这里MyData_NA只有Sales = NA的行,而MyData_Daymean的平均销售额按天分组。

for (i in 1:nrow(MyData_NA)){if (MyData_NA[i,day] == MyData_Daymean[i,1])
{ MyData_NA[i,2] <- MyData_Daymean[i,2] }}

这似乎不起作用。

2 个答案:

答案 0 :(得分:3)

使用dplyr的解决方案。我们可以mutateifelse一起使用NA替换缺失的值。关键是在同一group_by上使用Day,因此平均计算仅为该组。

library(dplyr)

dt2 <- dt %>%
  group_by(Day) %>%
  mutate(Sales = ifelse(is.na(Sales), mean(Sales, na.rm = TRUE), Sales)) %>%
  ungroup()
dt2
# # A tibble: 9 x 2
#        Day Sales
#     <fctr> <dbl>
# 1 12-01-17  28.0
# 2 13-01-17  13.0
# 3 14-01-17   2.0
# 4 12-01-17  33.0
# 5 13-01-17  17.0
# 6 14-01-17  11.0
# 7 12-01-17  23.0
# 8 13-01-17  21.0
# 9 14-01-17   6.5

数据

dt <- read.table(text = "     Day Sales
12-01-17    NA
                 13-01-17    13
                 14-01-17    2
                 12-01-17    33
                 13-01-17    NA
                 14-01-17    11
                 12-01-17    23
                 13-01-17    21
                 14-01-17    NA",
                 header = TRUE)

答案 1 :(得分:3)

我们也可以使用na.aggregate

中的zoo
library(zoo)
dt$Sales <-  with(dt, ave(Sales, Day, FUN = na.aggregate))
dt$Sales
#[1] 28.0 13.0  2.0 33.0 17.0 11.0 23.0 21.0  6.5

data.table假设'销售'为numeric类型

library(data.table)
setDT(dt)[, Sales := na.aggregate(Sales), Day]