我的数据集:MyData
Day Sales
12-01-17 NA
12-01-17 NA
13-01-17 13
14-01-17 2
12-01-17 33
13-01-17 NA
13-01-17 NA
13-01-17 NA
14-01-17 11
12-01-17 23
13-01-17 21
14-01-17 NA
我想用当天的平均销售额替换缺失的销售一天。因此NA
12-01-2017
的平均值为33,23,即28。
我试过的R代码就是这个。这里MyData_NA只有Sales = NA的行,而MyData_Daymean的平均销售额按天分组。
for (i in 1:nrow(MyData_NA)){if (MyData_NA[i,day] == MyData_Daymean[i,1])
{ MyData_NA[i,2] <- MyData_Daymean[i,2] }}
这似乎不起作用。
答案 0 :(得分:3)
使用dplyr
的解决方案。我们可以mutate
与ifelse
一起使用NA
替换缺失的值。关键是在同一group_by
上使用Day
,因此平均计算仅为该组。
library(dplyr)
dt2 <- dt %>%
group_by(Day) %>%
mutate(Sales = ifelse(is.na(Sales), mean(Sales, na.rm = TRUE), Sales)) %>%
ungroup()
dt2
# # A tibble: 9 x 2
# Day Sales
# <fctr> <dbl>
# 1 12-01-17 28.0
# 2 13-01-17 13.0
# 3 14-01-17 2.0
# 4 12-01-17 33.0
# 5 13-01-17 17.0
# 6 14-01-17 11.0
# 7 12-01-17 23.0
# 8 13-01-17 21.0
# 9 14-01-17 6.5
数据强>
dt <- read.table(text = " Day Sales
12-01-17 NA
13-01-17 13
14-01-17 2
12-01-17 33
13-01-17 NA
14-01-17 11
12-01-17 23
13-01-17 21
14-01-17 NA",
header = TRUE)
答案 1 :(得分:3)
我们也可以使用na.aggregate
zoo
library(zoo)
dt$Sales <- with(dt, ave(Sales, Day, FUN = na.aggregate))
dt$Sales
#[1] 28.0 13.0 2.0 33.0 17.0 11.0 23.0 21.0 6.5
或data.table
假设'销售'为numeric
类型
library(data.table)
setDT(dt)[, Sales := na.aggregate(Sales), Day]