我认为这将是相当基础的,但无法在我所拥有的任何介绍性文本中找到如何做到这一点,也无法通过Google搜索。我想通过分类变量绘制连续变量的均值,然后按因子分组。连续变量是' cd' (血液cd4蛋白),分类是年(1 - 10年),因子是失败= 0或1.我的数据集是' F3'
我已经使用聚合来逐年获得平均值,但无法找到如何通过失败(0,1)将其分组为no和yes。宁愿使用ggplot。
我从中获得的情节:
ggplot(F3, aes(factor(year), mean(cd), color = factor(failure))) +
geom_line() +
geom_point(size=2)
是一条水平线或两条线重叠,但表示图例中的组失败。所以,它没有按年份绘制平均值,只是整体均值。请帮忙。
数据:
F3 <- structure(list(year = structure(c(6L, 7L, 8L, 9L, 10L, 1L, 2L,
3L, 4L, 5L, 6L), .Label = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10"), class = "factor"), cd = c(555L, 511L, 540L,
596L, 553L, 142L, 173L, 271L, 163L, 108L, 61L), failure = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor")), .Names = c("year",
"cd", "failure"), row.names = c("1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11"), class = "data.frame")
答案 0 :(得分:0)
仍然不确定,但也许这就是你想要做的?使用更大的数据集:
library(ggplot2)
library(dplyr)
F4 <- F3 %>% group_by(year, failure) %>% summarize(cd = mean(cd))
ggplot(F4, aes(year, cd, color = failure, group = failure)) +
geom_point() + geom_line()
包括平均值的标准误差:
F4 <- F3 %>% group_by(year, failure) %>%
summarize(mean.cd = mean(cd), se = sd(cd) / sqrt(n()))
F4$failure <- factor(F4$failure)
pos <- position_dodge(width = 0.2)
ggplot(F4, aes(year, mean.cd, color = failure, ymin = mean.cd - se,
ymax = mean.cd + se, group = failure)) +
geom_point(position = pos) + geom_line(position = pos) +
geom_errorbar(position = pos, width = 0.2)
请注意,某些点只有一个值,因此您无法计算SEM或sd。
答案 1 :(得分:0)
mydf <- structure(list(SerialNum = c("983\n837\n424\n ", "123\n456\n789\n136",
"987\n654\n321\n975\n ", "424\n983\n837", "456\n789\n123\n136"
), Year = c(2015, 2014, 2010, 2015, 2014), Name = c("Michael\nLewis\nPaul\n ",
"Elaine\nJerry\nGeorge\nKramer", "John\nPaul\nGeorge\nRingo\nNA",
"Paul\nMichael\nLewis", "Jerry\nGeorge\nElaine\nKramer")), .Names = c("SerialNum",
"Year", "Name"), row.names = c(NA, -5L), class = "data.frame")