我有像这样的data.frame
x <- data.frame(Category=factor(c("One", "One", "Four", "Two","Two",
"Three", "Two", "Four","Three")),
City=factor(c("D","A","B","B","A","D","A","C","C")),
Frequency=c(10,1,5,2,14,8,20,3,5))
Category City Frequency
1 One D 10
2 One A 1
3 Four B 5
4 Two B 2
5 Two A 14
6 Three D 8
7 Two A 20
8 Four C 3
9 Three C 5
我想用sum(频率)创建一个数据透视表,并使用ddply函数,如下所示:
ddply(x,.(Category,City),summarize,Total=sum(Frequency))
Category City Total
1 Four B 5
2 Four C 3
3 One A 1
4 One D 10
5 Three C 5
6 Three D 8
7 Two A 34
8 Two B 2
但我需要按每个类别组中的总数排序此结果。像这样:
Category City Frequency
1 Two A 34
2 Two B 2
3 Three D 14
4 Three C 5
5 One D 10
6 One A 1
7 Four B 5
8 Four C 3
我看过并尝试过排序,排序,安排,但似乎没有什么能做我需要的。我怎么能在R中这样做?
答案 0 :(得分:5)
这是一个很好的问题,我无法想到这样做的直接方式,而不是创建总大小索引然后按它排序。这是一种可能的data.table
方法,该方法使用setorder
函数,该方法将按引用对
library(data.table)
Res <- setDT(x)[, .(Total = sum(Frequency)), by = .(Category, City)]
setorder(Res[, size := sum(Total), by = Category], -size, -Total, Category)[]
# Category City Total size
# 1: Two A 34 36
# 2: Two B 2 36
# 3: Three D 8 13
# 4: Three C 5 13
# 5: One D 10 11
# 6: One A 1 11
# 7: Four B 5 8
# 8: Four C 3 8
或者,如果您深入Hdleyverse,我们可以使用较新的dplyr
包(根据@akrun建议)获得类似的结果
library(dplyr)
x %>%
group_by(Category, City) %>%
summarise(Total = sum(Frequency)) %>%
mutate(size= sum(Total)) %>%
ungroup %>%
arrange(-size, -Total, Category)
答案 1 :(得分:4)
以下是基本R版本,其中DF
是您ddply
电话的结果:
with(DF, DF[order(-ave(Total, Category, FUN=sum), Category, -Total), ])
产生
Category City Total
7 Two A 34
8 Two B 2
6 Three D 8
5 Three C 5
4 One D 10
3 One A 1
1 Four B 5
2 Four C 3
逻辑与David的基本相同,为每个Total
计算Category
的总和,对每个Category
中的所有行使用该数字(我们这样做)与ave(..., FUN=sum)
),然后再加上一些断路器,以确保按预期发布。