看一下这张表:
Industry City Week Year Budget
1 Hotel. 1 34 2005 10
2 Trans. 1 34 2005 10
3 Hotel. 1 34 2006 20
4 Trans. 2 35 2005 10
5 Hotel. 1 34 2007 NA
6 Trans. 3 34 2005 10
7 Hotel. 3 35 2005 10
8 Trans. 3 36 2005 10
我想用精确行业,城市和周的预算平均值(忽略年份)替换所有NAs。
因此,在这种情况下,NA将为15(平均值为1和3)。
答案 0 :(得分:2)
简单,仅适用于您的示例(使用一个NA)
ds<-read.table(textConnection('1 Hotel. 1 34 2005 10
2 Trans. 1 34 2005 10
3 Hotel. 1 34 2006 20
4 Trans. 2 35 2005 10
5 Hotel. 1 34 2007 NA
6 Trans. 3 34 2005 10
7 Hotel. 3 35 2005 10
8 Trans. 3 36 2005 10'))
colnames(ds)<-c('lp','Industry','City','Week','Year','Budget')
ds$Budget[is.na(ds$Budget)]<-15
但如果您的数据集包含更多观察值,则可以扩展
means<-aggregate(Budget~Industry+City+Week,data=ds,mean)
names(means)[ncol(means)]<-'BudgetMean'
ds.merged<-merge(x=ds,y=means,all.x=T)
ds.merged<-transform(ds.merged,
BudgetImp=ifelse(is.na(Budget),BudgetMean,Budget))
修改即可。
与plyr
包
library(plyr)
ds<-ddply(ds,.(Industry,City,Week),mutate,
BudgetMean=mean(Budget,na.rm=T),
Budget=ifelse(is.na(Budget),BudgetMean,Budget))
甚至
ds<-ddply(ds,.(Industry,City,Week),mutate,
Budget=ifelse(is.na(Budget),mean(Budget,na.rm=T),Budget))
答案 1 :(得分:2)
这是一个data.table
解决方案。
DF <- read.table(header=TRUE, stringsAsFactors = FALSE, text=' Industry City Week Year Budget
Hotel. 1 34 2005 10
Trans. 1 34 2005 10
Hotel. 1 34 2006 20
Trans. 2 35 2005 10
Hotel. 1 34 2007 NA
Trans. 3 34 2005 10
Hotel. 3 35 2005 10
Trans. 3 34 2005 NA ', colClasses = c("character", rep("double", 4)))
请注意,我更改了最后一行以进一步说明该示例。
require(data.table)
DT <- data.table(DF)
DT[, Budget := ifelse(is.na(Budget), mean(Budget, na.rm=TRUE), Budget), by = list(Industry, City, Week)]
DT
## Industry City Week Year Budget
## 1: Hotel. 1 34 2005 10
## 2: Trans. 1 34 2005 10
## 3: Hotel. 1 34 2006 20
## 4: Trans. 2 35 2005 10
## 5: Hotel. 1 34 2007 15
## 6: Trans. 3 34 2005 10
## 7: Hotel. 3 35 2005 10
## 8: Trans. 3 36 2005 10