R - 在几个变量上使用聚合并组合唯一值

时间:2016-11-16 01:05:38

标签: r dataframe aggregate unique

假设我有一个数据帧汽车,前几行是:

Brand         Type         Year
BMW           Compact      2009
BMW           Sedan        2010
BENZ          Sedan        2010
BENZ          Compact      2012
BMW           Compact      2008
BENZ          Sedan        2011

我想使用聚合来首先找到变量"品牌"的每个组合。和"键入",然后找到每种组合的唯一年数。例如,所需的输出如下:

Brand        Type         num_unique_years
BMW          Compact      2(which are 2009, 2008)
BMW          Sedan        1(2010)
Benz         Compact      1(2012)
Benz         Sedan        2(2010,2011)

步骤基本上是这样的:

x <- subset(cars, Brand == "bmw" & Type == "compact")
length(unique(x$Year))
which gives me the output 2

但是,我不知道如何将这些单独的步骤合并到一个功能中。

感谢您的帮助

3 个答案:

答案 0 :(得分:1)

我可以通过data.table:

分两步完成
library(data.table)
dt <- data.table(brand = c("BMW", "BMW", "BENZ", "BENZ", "BMW", "BENZ"),
                 type = c("Compact", "Sedan", "Sedan", "Compact", "Compact", "Sedan"),
                 year = c(2009, 2010, 2010, 2012, 2008, 2011))


dt[ , num_unique_years := length(unique(year)), by = .(brand, type)]
unique(dt[, .(type, brand, num_unique_years)])

最终结果:

      type brand num_unique_years
1: Compact   BMW                2
2:   Sedan   BMW                1
3:   Sedan  BENZ                2
4: Compact  BENZ                1

答案 1 :(得分:1)

只需定义适当的聚合函数并使用aggregate即可。没有包使用。

len_years <- function(years) {
  u <- unique(sort(years))
  paste0(length(u), "(", toString(u), ")")
}
Ag <- aggregate(Year ~., cars, len_years)
names(Ag)[3] <- "num_unique_years"

,并提供:

> Ag
  Brand    Type num_unique_years
1  BENZ Compact          1(2012)
2   BMW Compact    2(2008, 2009)
3  BENZ   Sedan    2(2010, 2011)
4   BMW   Sedan          1(2010)

<强>变型

1)如果您不需要这些年份,请用

替换该功能
len_years <- function(years) length(unique(years))

2)或者,将aggregate语句和下一个语句替换为:

Ag <- aggregate(data.frame(num_unique_years = cars[[3]]), cars[-3], len_years)

注意:可重复形式的输入cars为:

Lines <- "Brand         Type         Year
BMW           Compact      2009
BMW           Sedan        2010
BENZ          Sedan        2010
BENZ          Compact      2012
BMW           Compact      2008
BENZ          Sedan        2011"
cars <- read.table(text = Lines, header = TRUE)

答案 2 :(得分:0)

如何使用dplyr:

library(dplyr) count(group_by(count(group_by(cars,Brand,Type, Year)),Brand,Type))