所有因子级别组合的汇总值,包括缺失的

时间:2017-08-14 18:41:12

标签: r

我试图根据多列找到数据帧的最小值。我能够使用下面的聚合函数成功完成此操作。但是,结果不包含输入数据框中没有数据的因素组合。

我得到了什么:

# all possibilities of fruits, cities, and vegetables:
fruits<-c('apple','banana','grape')
cities<-c('new york','chicago','los angeles')
vegetables<-c('cucumber','mushroom')

#my input (ie, a sample from a test:
inputdf<-data.frame(fruit=c('apple','apple','apple','banana','banana','banana','grape','grape','grape'),city=c('new york','new york','new york','new york','chicago','los angeles','chicago','chicago','chicago'),vegetable=c('cucumber','cucumber','mushroom','cucumber','mushroom','mushroom','cucumber','cucumber','cucumber'),value=c(5,3,4,6,5,7,2,7,4))

#my aggregation:
outdf<-aggregate(value ~ fruit + city + vegetable,inputdf,function(x) min(x))

我得到的输出是:

fruit   city        vegetable   value
grape   chicago     cucumber    2
apple   new york    cucumber    3
banana  new york    cucumber    6
banana  chicago     mushroom    5
banana  los angeles mushroom    7
apple   new york    mushroom    4

这是正确的,但是,我还想要输入对应于输入df中根本不存在的列组合的行:

fruit   city        vegetable   value
apple   new york    cucumber    3
apple   new york    mushroom    4
apple   chicago     cucumber    NA
apple   chicago     mushroom    NA
apple   los angeles cucumber    NA
apple   los angeles mushroom    NA
banana  new york    cucumber    6
banana  new york    mushroom    NA
banana  chicago     cucumber    NA
banana  chicago     mushroom    5
banana  los angeles cucumber    NA
banana  los angeles mushroom    7
grape   new york    cucumber    NA
grape   new york    mushroom    NA
grape   chicago     cucumber    2
grape   chicago     mushroom    NA
grape   los angeles cucumber    NA
grape   los angeles mushroom    NA

我希望能够为要合并的任意数量的列执行此操作。有一个简单的方法吗?我想要输出的原因是因为我需要将NA转换为特定值并再次在相同子集上平均这些值。谢谢!

1 个答案:

答案 0 :(得分:3)

您可以使用expand.grid生成所有组合,然后使用merge

outdf<-aggregate(value ~ fruit + city + vegetable,inputdf,function(x) min(x))
DF=expand.grid(fruits, cities, vegetables)
outdf=merge(outdf,DF,by.x=c('fruit','city','vegetable'),by.y=c('Var1','Var2','Var3'),all.y=T) 
> outdf
    fruit        city vegetable value
1   apple     chicago  cucumber    NA
2   apple     chicago  mushroom    NA
3   apple los angeles  cucumber    NA
4   apple los angeles  mushroom    NA
5   apple    new york  cucumber     3
6   apple    new york  mushroom     4
7  banana     chicago  cucumber    NA
8  banana     chicago  mushroom     5
9  banana los angeles  cucumber    NA
10 banana los angeles  mushroom     7
11 banana    new york  cucumber     6
12 banana    new york  mushroom    NA
13  grape     chicago  cucumber     2
14  grape     chicago  mushroom    NA
15  grape los angeles  cucumber    NA
16  grape los angeles  mushroom    NA
17  grape    new york  cucumber    NA
18  grape    new york  mushroom    NA