通过分类变量绘制连续变量的平均值,并按因子分组

时间:2015-10-03 06:36:29

标签: r plot ggplot2

我认为这将是相当基础的,但无法在我所拥有的任何介绍性文本中找到如何做到这一点,也无法通过Google搜索。我想通过分类变量绘制连续变量的均值,然后按因子分组。连续变量是' cd' (血液cd4蛋白),分类是年(1 - 10年),因子是失败= 0或1.我的数据集是' F3'

我已经使用聚合来逐年获得平均值,但无法找到如何通过失败(0,1)将其分组为no和yes。宁愿使用ggplot。

我从中获得的情节:

ggplot(F3, aes(factor(year), mean(cd), color = factor(failure))) + 
geom_line()    + 
geom_point(size=2)

enter image description here

是一条水平线或两条线重叠,但表示图例中的组失败。所以,它没有按年份绘制平均值,只是整体均值。请帮忙。

数据

F3 <- structure(list(year = structure(c(6L, 7L, 8L, 9L, 10L, 1L, 2L, 
3L, 4L, 5L, 6L), .Label = c("1", "2", "3", "4", "5", "6", "7", 
"8", "9", "10"), class = "factor"), cd = c(555L, 511L, 540L, 
596L, 553L, 142L, 173L, 271L, 163L, 108L, 61L), failure = structure(c(1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor")), .Names = c("year", 
"cd", "failure"), row.names = c("1", "2", "3", "4", "5", "6", 
"7", "8", "9", "10", "11"), class = "data.frame")

2 个答案:

答案 0 :(得分:0)

仍然不确定,但也许这就是你想要做的?使用更大的数据集:

library(ggplot2)
library(dplyr)

F4 <- F3 %>% group_by(year, failure) %>% summarize(cd = mean(cd))

ggplot(F4, aes(year, cd, color = failure, group = failure)) +
  geom_point() + geom_line()

enter image description here

包括平均值的标准误差:

F4 <- F3 %>% group_by(year, failure) %>% 
  summarize(mean.cd = mean(cd), se = sd(cd) / sqrt(n()))
F4$failure <- factor(F4$failure)

pos <- position_dodge(width = 0.2)

ggplot(F4, aes(year, mean.cd, color = failure, ymin = mean.cd - se, 
               ymax = mean.cd + se, group = failure)) +
  geom_point(position = pos) + geom_line(position = pos) + 
  geom_errorbar(position = pos, width = 0.2)

请注意,某些点只有一个值,因此您无法计算SEM或sd。

enter image description here

答案 1 :(得分:0)

mydf <- structure(list(SerialNum = c("983\n837\n424\n ", "123\n456\n789\n136", 
"987\n654\n321\n975\n ", "424\n983\n837", "456\n789\n123\n136"
), Year = c(2015, 2014, 2010, 2015, 2014), Name = c("Michael\nLewis\nPaul\n ", 
"Elaine\nJerry\nGeorge\nKramer", "John\nPaul\nGeorge\nRingo\nNA", 
"Paul\nMichael\nLewis", "Jerry\nGeorge\nElaine\nKramer")), .Names = c("SerialNum", 
"Year", "Name"), row.names = c(NA, -5L), class = "data.frame")