将未知物品替换为R中相似的物品

时间:2014-03-18 07:06:18

标签: r

看一下这张表:

  Industry City Week  Year  Budget
1 Hotel.    1    34   2005   10
2 Trans.    1    34   2005   10
3 Hotel.    1    34   2006   20
4 Trans.    2    35   2005   10
5 Hotel.    1    34   2007   NA
6 Trans.    3    34   2005   10
7 Hotel.    3    35   2005   10
8 Trans.    3    36   2005   10 

我想用精确行业,城市和周的预算平均值(忽略年份)替换所有NAs。

因此,在这种情况下,NA将为15(平均值为1和3)。

2 个答案:

答案 0 :(得分:2)

简单,仅适用于您的示例(使用一个NA)

ds<-read.table(textConnection('1 Hotel.    1    34   2005   10
    2 Trans.    1    34   2005   10
    3 Hotel.    1    34   2006   20
    4 Trans.    2    35   2005   10
    5 Hotel.    1    34   2007   NA
    6 Trans.    3    34   2005   10
    7 Hotel.    3    35   2005   10
    8 Trans.    3    36   2005   10'))

colnames(ds)<-c('lp','Industry','City','Week','Year','Budget')

ds$Budget[is.na(ds$Budget)]<-15

但如果您的数据集包含更多观察值,则可以扩展

means<-aggregate(Budget~Industry+City+Week,data=ds,mean)
names(means)[ncol(means)]<-'BudgetMean'
ds.merged<-merge(x=ds,y=means,all.x=T)
ds.merged<-transform(ds.merged,
                     BudgetImp=ifelse(is.na(Budget),BudgetMean,Budget))

修改即可。 与plyr

相同的事情
library(plyr)
ds<-ddply(ds,.(Industry,City,Week),mutate,
          BudgetMean=mean(Budget,na.rm=T),
          Budget=ifelse(is.na(Budget),BudgetMean,Budget))

甚至

ds<-ddply(ds,.(Industry,City,Week),mutate,
          Budget=ifelse(is.na(Budget),mean(Budget,na.rm=T),Budget))

答案 1 :(得分:2)

这是一个data.table解决方案。

DF <- read.table(header=TRUE, stringsAsFactors = FALSE, text='  Industry City Week  Year  Budget
Hotel.    1    34   2005   10
Trans.    1    34   2005   10
Hotel.    1    34   2006   20
Trans.    2    35   2005   10
Hotel.    1    34   2007   NA
Trans.    3    34   2005   10
Hotel.    3    35   2005   10
Trans.    3    34   2005   NA ', colClasses = c("character", rep("double", 4)))

请注意,我更改了最后一行以进一步说明该示例。

require(data.table)
DT <- data.table(DF)
DT[, Budget := ifelse(is.na(Budget), mean(Budget, na.rm=TRUE), Budget), by = list(Industry, City, Week)]

DT
##    Industry City Week Year Budget
## 1:   Hotel.    1   34 2005     10
## 2:   Trans.    1   34 2005     10
## 3:   Hotel.    1   34 2006     20
## 4:   Trans.    2   35 2005     10
## 5:   Hotel.    1   34 2007     15
## 6:   Trans.    3   34 2005     10
## 7:   Hotel.    3   35 2005     10
## 8:   Trans.    3   36 2005     10