Question

假设我有一个数据帧汽车，前几行是：

Brand         Type         Year
BMW           Compact      2009
BMW           Sedan        2010
BENZ          Sedan        2010
BENZ          Compact      2012
BMW           Compact      2008
BENZ          Sedan        2011

我想使用聚合来首先找到变量＆＃34;品牌＆＃34;的每个组合。和＆＃34;键入＆＃34;，然后找到每种组合的唯一年数。例如，所需的输出如下：

Brand        Type         num_unique_years
BMW          Compact      2(which are 2009, 2008)
BMW          Sedan        1(2010)
Benz         Compact      1(2012)
Benz         Sedan        2(2010,2011)

步骤基本上是这样的：

x <- subset(cars, Brand == "bmw" & Type == "compact")
length(unique(x$Year))
which gives me the output 2

但是，我不知道如何将这些单独的步骤合并到一个功能中。

感谢您的帮助

Answer 1

我可以通过data.table：

分两步完成

library(data.table)
dt <- data.table(brand = c("BMW", "BMW", "BENZ", "BENZ", "BMW", "BENZ"),
                 type = c("Compact", "Sedan", "Sedan", "Compact", "Compact", "Sedan"),
                 year = c(2009, 2010, 2010, 2012, 2008, 2011))


dt[ , num_unique_years := length(unique(year)), by = .(brand, type)]
unique(dt[, .(type, brand, num_unique_years)])

最终结果：

      type brand num_unique_years
1: Compact   BMW                2
2:   Sedan   BMW                1
3:   Sedan  BENZ                2
4: Compact  BENZ                1

Answer 2

只需定义适当的聚合函数并使用aggregate即可。没有包使用。

len_years <- function(years) {
  u <- unique(sort(years))
  paste0(length(u), "(", toString(u), ")")
}
Ag <- aggregate(Year ~., cars, len_years)
names(Ag)[3] <- "num_unique_years"

，并提供：

> Ag
  Brand    Type num_unique_years
1  BENZ Compact          1(2012)
2   BMW Compact    2(2008, 2009)
3  BENZ   Sedan    2(2010, 2011)
4   BMW   Sedan          1(2010)

<强>变型

1）如果您不需要这些年份，请用

替换该功能

len_years <- function(years) length(unique(years))

2）或者，将aggregate语句和下一个语句替换为：

Ag <- aggregate(data.frame(num_unique_years = cars[[3]]), cars[-3], len_years)

注意：可重复形式的输入cars为：

Lines <- "Brand         Type         Year
BMW           Compact      2009
BMW           Sedan        2010
BENZ          Sedan        2010
BENZ          Compact      2012
BMW           Compact      2008
BENZ          Sedan        2011"
cars <- read.table(text = Lines, header = TRUE)

Answer 3

如何使用dplyr：

library(dplyr) count(group_by(count(group_by(cars,Brand,Type, Year)),Brand,Type))

R - 在几个变量上使用聚合并组合唯一值

3 个答案: