我想用ggplot得到子集的facet subet mean(x + y axis)。但是,我得到数据的平均值而不是子集1。我不知道如何解决这个问题。
hsb2<-read.table("http://www.ats.ucla.edu/stat/data/hsb2.csv", sep=",", header=T)
head(hsb2)
hsb2$gender = as.factor(hsb2$female)
ggplot() +
geom_point(aes(y = read,x = write,colour = gender),data=hsb2,size = 2.2,alpha = 0.9) +
scale_colour_brewer(guide = guide_legend(),palette = 'Set1') +
stat_smooth(aes(x = write,y = read),data=hsb2,colour = '#000000',size = 0.8,method = lm,formula = 'y ~ x') +
geom_vline(aes(xintercept = mean(write)),data=hsb2,linetype = 3) +
geom_hline(aes(yintercept = mean(read)),data=hsb2,linetype = 3) +
facet_wrap(facets = ~gender)
答案 0 :(得分:7)
一种方法是明确计算每个性别的均值(x和y),并将它们存储为原始数据框中的新列。当分面按性别分割时,线条会被绘制到您想要的位置。
#compute the read and write means for each gender
read_means <- tapply(hsb2$read, hsb2$gender, mean)
write_means <- tapply(hsb2$write, hsb2$gender, mean)
#store it in the data frame
hsb2$read_mean <- ifelse(hsb2$gender==0, read_means[1], read_means[2])
hsb2$write_mean <- ifelse(hsb2$gender==0, write_means[1], write_means[2])
上述行的替代方法是使用ddply。
可以使用一行创建新列。
library(plyr)
ddply(hsb2, "gender", transform,
read_mean = mean(read),
write_mean = mean(write))
现在,将两个新列方法传递给ggplot中的vline和hline调用。
ggplot() +
geom_point(aes(y = read,x = write,colour = gender),data=hsb2,size = 2.2,alpha = 0.9) +
scale_colour_brewer(guide = guide_legend(),palette = 'Set1') +
stat_smooth(aes(x = write,y = read),data=hsb2,colour = '#000000',
size = 0.8,method = lm,formula = 'y ~ x') +
geom_vline(aes(xintercept = write_mean),data=hsb2,linetype = 3) +
geom_hline(aes(yintercept = read_mean),data=hsb2,linetype = 3) +
facet_wrap(facets = ~gender)
产地: