假设我有一个数据帧汽车,前几行是:
Brand Type Year
BMW Compact 2009
BMW Sedan 2010
BENZ Sedan 2010
BENZ Compact 2012
BMW Compact 2008
BENZ Sedan 2011
我想使用聚合来首先找到变量"品牌"的每个组合。和"键入",然后找到每种组合的唯一年数。例如,所需的输出如下:
Brand Type num_unique_years
BMW Compact 2(which are 2009, 2008)
BMW Sedan 1(2010)
Benz Compact 1(2012)
Benz Sedan 2(2010,2011)
步骤基本上是这样的:
x <- subset(cars, Brand == "bmw" & Type == "compact")
length(unique(x$Year))
which gives me the output 2
但是,我不知道如何将这些单独的步骤合并到一个功能中。
感谢您的帮助
答案 0 :(得分:1)
我可以通过data.table:
分两步完成library(data.table)
dt <- data.table(brand = c("BMW", "BMW", "BENZ", "BENZ", "BMW", "BENZ"),
type = c("Compact", "Sedan", "Sedan", "Compact", "Compact", "Sedan"),
year = c(2009, 2010, 2010, 2012, 2008, 2011))
dt[ , num_unique_years := length(unique(year)), by = .(brand, type)]
unique(dt[, .(type, brand, num_unique_years)])
最终结果:
type brand num_unique_years
1: Compact BMW 2
2: Sedan BMW 1
3: Sedan BENZ 2
4: Compact BENZ 1
答案 1 :(得分:1)
只需定义适当的聚合函数并使用aggregate
即可。没有包使用。
len_years <- function(years) {
u <- unique(sort(years))
paste0(length(u), "(", toString(u), ")")
}
Ag <- aggregate(Year ~., cars, len_years)
names(Ag)[3] <- "num_unique_years"
,并提供:
> Ag
Brand Type num_unique_years
1 BENZ Compact 1(2012)
2 BMW Compact 2(2008, 2009)
3 BENZ Sedan 2(2010, 2011)
4 BMW Sedan 1(2010)
<强>变型强>
1)如果您不需要这些年份,请用
替换该功能len_years <- function(years) length(unique(years))
2)或者,将aggregate
语句和下一个语句替换为:
Ag <- aggregate(data.frame(num_unique_years = cars[[3]]), cars[-3], len_years)
注意:可重复形式的输入cars
为:
Lines <- "Brand Type Year
BMW Compact 2009
BMW Sedan 2010
BENZ Sedan 2010
BENZ Compact 2012
BMW Compact 2008
BENZ Sedan 2011"
cars <- read.table(text = Lines, header = TRUE)
答案 2 :(得分:0)
如何使用dplyr:
library(dplyr)
count(group_by(count(group_by(cars,Brand,Type, Year)),Brand,Type))