我有以下数据框。
Value Type Year
10 car 1995
23 car 1995
2 car 1997
10 car 2000
11 bus 1997
23 bus 1995
2 bus 1997
10 bus 2000
12 car 1997
13 bus 1995
14 jeep 2000
15 jeep 1995
23 jeep 1995
2 jeep 1997
10 jeep 2000
8 car 2000
9 bus 2000
1 jeep 1997
我想先根据列type
然后year
对行进行求和。我想要以下输出。
Value Type Year
33 car 1995
14 car 1997
18 car 2000
36 bus 1995
13 bus 1997
19 bus 2000
38 jeep 1995
3 jeep 1997
24 jeep 2000
谁能告诉我如何获得这个?
答案 0 :(得分:5)
我们可以按组使用其中一个聚合功能。如果我们只对base R
感兴趣,aggregate
是一个有用的紧凑函数。
aggregate(Value ~ Year + Type, df1, FUN=sum)
# Year Type Value
#1 1995 car 33
#2 1997 car 14
#3 2000 car 18
#4 1995 bus 36
#5 1997 bus 13
#6 2000 bus 19
#7 1995 jeep 38
#8 1997 jeep 3
#9 2000 jeep 24
或者我们可以尝试dplyr
library(dplyr)
df1 %>%
group_by(Type, Year) %>%
summarise(Value=sum(Value))
# Type Year Value
#1 car 1995 33
#2 car 1997 14
#3 car 2000 18
#4 bus 1995 36
#5 bus 1997 13
#6 bus 2000 19
#7 jeep 1995 38
#8 jeep 1997 3
#9 jeep 2000 24
或另一个紧凑而快速的选项是data.table
library(data.table)#v1.9.5+
setDT(df1)[, list(Value=sum(Value)), .(Type, Year)]
#Type Year Value
#1: car 1995 33
#2: car 1997 14
#3: car 2000 18
#4: bus 1997 13
#5: bus 1995 36
#6: bus 2000 19
#7: jeep 2000 24
#8: jeep 1995 38
#9: jeep 1997 3
或基于sqldf
library(sqldf)
sqldf('select Type, Year,
sum(Value) as Value
from df1
group by Type, Year')
如果我们想要plot
,
library(ggplot2)
df1 %>%
group_by(Type, Year) %>%
summarise(Value=sum(Value)) %>%
ggplot(., aes(x=Year, y=Value))+
geom_line() +
facet_wrap(~Type)
df1 <- structure(list(Value = c(10L, 23L, 2L, 10L, 11L, 23L, 2L, 10L,
12L, 13L, 14L, 15L, 23L, 2L, 10L, 8L, 9L, 1L), Type = c("car",
"car", "car", "car", "bus", "bus", "bus", "bus", "car", "bus",
"jeep", "jeep", "jeep", "jeep", "jeep", "car", "bus", "jeep"),
Year = c(1995L, 1995L, 1997L, 2000L, 1997L, 1995L, 1997L,
2000L, 1997L, 1995L, 2000L, 1995L, 1995L, 1997L, 2000L, 2000L,
2000L, 1997L)), .Names = c("Value", "Type", "Year"),
class = "data.frame", row.names = c(NA, -18L))
df1$Type <- factor(df1$Type, levels=unique(df1$Type))
答案 1 :(得分:0)
我知道这不是理想的输出(老实说,我不喜欢它,因为它与整洁数据的原则不一致),但这是另一个不同的解决方案。你可以考虑一下。
tapply(df$Value, list(df$Type, df$Year), sum)
1995 1997 2000
bus 36 13 19
car 33 14 18
jeep 38 3 24