Question

我认为这将是相当基础的，但无法在我所拥有的任何介绍性文本中找到如何做到这一点，也无法通过Google搜索。我想通过分类变量绘制连续变量的均值，然后按因子分组。连续变量是＆＃39; cd＆＃39; （血液cd4蛋白），分类是年（1 - 10年），因子是失败= 0或1.我的数据集是＆＃39; F3＆＃39;

我已经使用聚合来逐年获得平均值，但无法找到如何通过失败（0,1）将其分组为no和yes。宁愿使用ggplot。

我从中获得的情节：

ggplot(F3, aes(factor(year), mean(cd), color = factor(failure))) + 
geom_line()    + 
geom_point(size=2)

是一条水平线或两条线重叠，但表示图例中的组失败。所以，它没有按年份绘制平均值，只是整体均值。请帮忙。

数据：

F3 <- structure(list(year = structure(c(6L, 7L, 8L, 9L, 10L, 1L, 2L, 
3L, 4L, 5L, 6L), .Label = c("1", "2", "3", "4", "5", "6", "7", 
"8", "9", "10"), class = "factor"), cd = c(555L, 511L, 540L, 
596L, 553L, 142L, 173L, 271L, 163L, 108L, 61L), failure = structure(c(1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor")), .Names = c("year", 
"cd", "failure"), row.names = c("1", "2", "3", "4", "5", "6", 
"7", "8", "9", "10", "11"), class = "data.frame")

Answer 1

仍然不确定，但也许这就是你想要做的？使用更大的数据集：

library(ggplot2)
library(dplyr)

F4 <- F3 %>% group_by(year, failure) %>% summarize(cd = mean(cd))

ggplot(F4, aes(year, cd, color = failure, group = failure)) +
  geom_point() + geom_line()

包括平均值的标准误差：

F4 <- F3 %>% group_by(year, failure) %>% 
  summarize(mean.cd = mean(cd), se = sd(cd) / sqrt(n()))
F4$failure <- factor(F4$failure)

pos <- position_dodge(width = 0.2)

ggplot(F4, aes(year, mean.cd, color = failure, ymin = mean.cd - se, 
               ymax = mean.cd + se, group = failure)) +
  geom_point(position = pos) + geom_line(position = pos) + 
  geom_errorbar(position = pos, width = 0.2)

请注意，某些点只有一个值，因此您无法计算SEM或sd。

Answer 2

mydf <- structure(list(SerialNum = c("983\n837\n424\n ", "123\n456\n789\n136", 
"987\n654\n321\n975\n ", "424\n983\n837", "456\n789\n123\n136"
), Year = c(2015, 2014, 2010, 2015, 2014), Name = c("Michael\nLewis\nPaul\n ", 
"Elaine\nJerry\nGeorge\nKramer", "John\nPaul\nGeorge\nRingo\nNA", 
"Paul\nMichael\nLewis", "Jerry\nGeorge\nElaine\nKramer")), .Names = c("SerialNum", 
"Year", "Name"), row.names = c(NA, -5L), class = "data.frame")

通过分类变量绘制连续变量的平均值，并按因子分组

2 个答案: