说我有下表DataTable
Cat1 | Cat2 | Val1 | Val2
--------------------------------------------
A | A | 1 | 2
A | B | 3 | 4
B | A | 5 | 6
B | B | 7 | 8
A | A | 2 | 4
A | B | 6 | 8
B | A | 10 | 12
B | B | 14 | 16
我想通过Cat1和Cat2进行汇总,分别采用Val1和Val2的Sum和Avg,我怎么能实现这个目标?
Cat1 | Cat2 | Sum Val1 | Avg Val2
--------------------------------------------
A | A | 3 | 3
A | B | 9 | 6
B | A | 15 | 9
B | B | 21 | 12
我使用聚合函数实现了单变量聚合:
aggregate(
Val1
~ Cat1 + Cat2
data=DataTable,
FUNC=sum
)
但是尽管玩cbind,却无法得到我想要的行为。我24小时都在学习R,所以我对这些概念不够熟悉,无法完全理解我一直在做的事情(总是很危险!)但是认为这一定很容易实现。 |
答案 0 :(得分:11)
set.seed(45)
df <- data.frame(c1=rep(c("A","A","B","B"), 2),
c2 = rep(c("A","B"), 4),
v1 = sample(8),
v2 = sample(1:100, 8))
> df
# c1 c2 v1 v2
# 1 A A 6 19
# 2 A B 3 1
# 3 B A 2 37
# 4 B B 8 86
# 5 A A 5 30
# 6 A B 1 44
# 7 B A 7 41
# 8 B B 4 39
v1 <- aggregate( v1 ~ c1 + c2, data = df, sum)
v2 <- aggregate( v2 ~ c1 + c2, data = df, mean)
out <- merge(v1, v2, by=c("c1","c2"))
> out
# c1 c2 v1 v2
# 1 A A 11 24.5
# 2 A B 4 22.5
# 3 B A 9 39.0
# 4 B B 12 62.5
**Edit:**
我建议你使用data.table
因为它让事情变得非常简单:
require(data.table)
dt <- data.table(df)
dt.out <- dt[, list(s.v1=sum(v1), m.v2=mean(v2)),
by=c("c1","c2")]
> dt.out
# c1 c2 s.v1 m.v2
# 1: A A 11 24.5
# 2: A B 4 22.5
# 3: B A 9 39.0
# 4: B B 12 62.5
答案 1 :(得分:8)
这是基础R解决方案:
首先,您的数据:
x <- structure(list(Cat1 = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L), .Label = c("A", "B"), class = "factor"), Cat2 = structure(c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A", "B"), class = "factor"),
Val1 = c(1L, 3L, 5L, 7L, 2L, 6L, 10L, 14L), Val2 = c(2L,
4L, 6L, 8L, 4L, 8L, 12L, 16L)), .Names = c("Cat1", "Cat2",
"Val1", "Val2"), class = "data.frame", row.names = c(NA, -8L))
然后,在ave()
中使用unique()
和within()
。
unique(
within(x, {
sum_val1 <- ave(Val1, Cat1, Cat2, FUN = sum)
mean_val2 <- ave(Val2, Cat1, Cat2, FUN = mean)
rm(Val1, Val2)
})
)
# Cat1 Cat2 mean_val2 sum_val1
# 1 A A 3 3
# 2 A B 6 9
# 3 B A 9 15
# 4 B B 12 21
或者,如果您对SQL感到满意,请使用sqldf
:
library(sqldf)
sqldf("select Cat1, Cat2,
sum(Val1) `Sum_Val1`,
avg(Val2) `Avg_Val2`
from x group by Cat1, Cat2")