我有一大堆这样的数据,包括以下变量。
Field Country AgeRange Score Test
我想绘制按Field和AgeRange分组的所有人口中每组人口的平均得分。也就是说,我喜欢这样的事情。
请注意,变量AgeRange会获取这3个值中的一个,而不是每个参与者的确切年龄。
根据需要对数据进行分组没有问题。例如,通过执行
aggr_data = aggregate(data, by=list(data$Field, data$AgeRange), FUN=mean)
我按照我需要的方式对数据进行分组,每个Field-AgeRange对都有一个得分平均值。问题是我无法找到一种直接的方法来从那些y轴对应于获得的分数的值和每对的x轴获得条形图。
我想我可以抓住我感兴趣的每个子集,就像这样
young_cs = subset(data, Field=="CompSci" & AgeRange=="18-35")
m_young_cs = mean(young_cs[,"Score"])
mid_cs = subset(data, Field=="CompSci" & AgeRange=="36-53")
m_mid_cs = mean(mid_cs[,"Score"])
然后绘制所有获得的手段,但这显然非常耗时。有更简单,更快捷的方法吗?
这是一个随机的小数据样本。
Field Country AgeRange Score Test
Psychology US 18-35 4.2 A
Psychology US 18-35 3.1 C
Psychology US 18-35 5.2 B
Psychology US 36-53 4.7 A
Psychology US 36-53 3.5 A
Psychology US 54+ 3.1 B
Psychology US 54+ 2.2 B
Psychology US 54+ 6.7 C
Psychology US 54+ 5.1 C
CompSci US 18-35 5.2 B
CompSci US 18-35 7.4 C
CompSci US 18-35 6.1 A
CompSci US 36-53 7.7 A
CompSci US 36-53 8.1 A
CompSci US 54+ 8.2 B
CompSci US 54+ 7.7 B
CompSci US 54+ 6.9 A
CompSci US 54+ 9.0 C
Mathematics US 18-35 6.2 B
Mathematics US 18-35 6.4 A
Mathematics US 18-35 7.1 A
Mathematics US 36-53 8.7 A
Mathematics US 36-53 9.4 A
Mathematics US 54+ 7.2 C
Mathematics US 54+ 6.1 B
Mathematics US 54+ 6.5 C
Mathematics US 54+ 7.0 C
答案 0 :(得分:1)
试试这个
#dummay data
field=c("P","C","M")
agerange=c(18,36,54)
score=rnorm(27, 7)
test=c("A","B","C")
df<-data.frame(field=rep(field, each=9),agerange= as.factor(rep(agerange,each=3, times=9)), score=score,test=rep(test, 9))
p<-ggplot(df, aes(x=field,y=score, fill=agerange))
p+geom_bar(stat="identity", position="dodge")
#or
p+stat_summary(fun.y = "mean",geom = "bar", position="dodge")
答案 1 :(得分:1)
library(ggplot2)
library(plyr)
# a faceting approach
df2 <- ddply(df, .(Field, AgeRange), summarise, mscore = mean(Score))
ggplot(df2, aes(x=AgeRange, y = mscore, fill = AgeRange)) + geom_bar( stat = "identity" ) +
facet_wrap(~Field)
# good enough?
df <- structure(list(field = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), class = "factor", .Label = c("C",
"M", "P")), agerange = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("18", "36",
"54"), class = "factor"), score = c(7.30127138725929, 7.37770686922096,
7.41317998674043, 6.64841878521039, 7.86711279540953, 7.17048025193224,
8.44148594576163, 8.13949581473566, 6.30312423530373, 6.78529906805563,
8.60960304217661, 7.08300936020387, 7.33518750196135, 7.29903060579703,
7.81598828814603, 6.51481883845345, 6.85779851460457, 8.5001156704776,
7.90225168492658, 6.57536590278191, 6.01020914251986, 7.28458327350041,
7.07419918080273, 8.93252585403122, 6.54527682832174, 6.35152240141314,
6.75924970388344, 7.30127138725929, 7.37770686922096, 7.41317998674043,
6.64841878521039, 7.86711279540953, 7.17048025193224, 8.44148594576163,
8.13949581473566, 6.30312423530373, 6.78529906805563, 8.60960304217661,
7.08300936020387, 7.33518750196135, 7.29903060579703, 7.81598828814603,
6.51481883845345, 6.85779851460457, 8.5001156704776, 7.90225168492658,
6.57536590278191, 6.01020914251986, 7.28458327350041, 7.07419918080273,
8.93252585403122, 6.54527682832174, 6.35152240141314, 6.75924970388344,
7.30127138725929, 7.37770686922096, 7.41317998674043, 6.64841878521039,
7.86711279540953, 7.17048025193224, 8.44148594576163, 8.13949581473566,
6.30312423530373, 6.78529906805563, 8.60960304217661, 7.08300936020387,
7.33518750196135, 7.29903060579703, 7.81598828814603, 6.51481883845345,
6.85779851460457, 8.5001156704776, 7.90225168492658, 6.57536590278191,
6.01020914251986, 7.28458327350041, 7.07419918080273, 8.93252585403122,
6.54527682832174, 6.35152240141314, 6.75924970388344), test = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L
), class = "factor", .Label = c("A", "B", "C"))), .Names = c("field",
"agerange", "score", "test"), row.names = c(NA, -81L), class = "data.frame")