在R中按组合并数据

时间:2015-01-13 03:22:21

标签: r merge aggregate

我构造了以下data.frame对象:

name <- c("Homer", "Marge", "Bart", "Lisa", "Maggie")
incidents <- c(133, 36, 1242, 2, NA)
gender <- c("MALE", "FEMALE", "MALE", "FEMALE", "FEMALE")
data <- data.frame(name, incidents, gender)

产生数据=

    name incidents gender
1  Homer       133   MALE
2  Marge        36 FEMALE
3   Bart      1242   MALE
4   Lisa         2 FEMALE
5 Maggie        NA FEMALE

首先我用

清理数据
clean_data <- data[!is.na(incidents), ]

这样就是clean_data =

   name incidents gender
1 Homer       133   MALE
2 Marge        36 FEMALE
3  Bart      1242   MALE
4  Lisa         2 FEMALE

现在我按性别汇总

agg <- aggregate(incidents ~ gender, clean_data, mean)

产生

  gender incidents
1 FEMALE      19.0
2   MALE     687.5

现在,我希望能够&#34;填写&#34;具有来自agg的数据的事件中的NA值,使得data =

    name incidents gender
1  Homer       133   MALE
2  Marge        36 FEMALE
3   Bart      1242   MALE
4   Lisa         2 FEMALE
5 Maggie      19.0 FEMALE

使用基础R做最简单的方法是什么?

2 个答案:

答案 0 :(得分:4)

您可以使用ave。它给出了&#34;意思&#34;与原始数据集中的值相同(&#34; vals&#34;),检查&#34; NA&#34; &#34;事件中的元素&#34;列,并用&#34; vals替换那些&#34;对应的&#34; NA&#34;元件。

 vals <- with(data, ave(incidents, gender, FUN= function(x)
                                         mean(x, na.rm=TRUE)))
 indx1 <- is.na(data$incidents)
 data$incidents[indx1] <- vals[indx1]

评论中@MrFlick显示的较短版本。使用&#34; ifelse&#34;,它取代了&#34; NA&#34;元素与&#34;意思是&#34;值。

 data$incidents<-with(data, ave(incidents, gender,
          FUN=function(x) ifelse(is.na(x), mean(x, na.rm=T), x)))

取代&#34; ifelse&#34;,&#34;替换&#34;也可以用作@Ananda Mahto用&#34; data.table&#34;表示。

答案 1 :(得分:4)

对于多样性,这里的方法是“data.table”,它也演示了replace函数。

library(data.table)
as.data.table(data)[
  , incidents := replace(incidents, is.na(incidents), 
                         mean(incidents, na.rm = TRUE)), 
  by = gender][]
#      name incidents gender
# 1:  Homer       133   MALE
# 2:  Marge        36 FEMALE
# 3:   Bart      1242   MALE
# 4:   Lisa         2 FEMALE
# 5: Maggie        19 FEMALE