我正在尝试平均数据重复次数,对一种治疗进行子集处理,然后对响应和另一因素进行条形图绘制。我的情节最终无法正常工作。任何帮助将不胜感激。
我的数据:
data <- structure(list(Sample = c(1011L, 1012L, 1014L, 1024L, 1025L,
1026L), Collection = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1",
"2"), class = "factor"), Irrigation = structure(c(3L, 3L, 3L,
5L, 5L, 5L), .Label = c("Rate1", "Rate2", "Rate3", "Rate4", "Rate5"
), class = "factor"), Variety = structure(c(2L, 1L, 3L, 3L, 2L,
1L), .Label = c("Hodag", "Lamoka", "Snowden"), class = "factor"),
Suc = c(0.7333, 0.4717, 0.5883, 0.6783, 0.8283, 0.6833),
Gluc = c(0.03, 0.04, 0.043, 0.075, 0.057, 0.087), L = c(59.48,
57.59, 59.25, 66.45, 68.29, 65.65), a = c(4.36, 6.85, 3.43,
1.7, 0.78, 2.84), b = c(26.82, 27.6, 26.2, 26.14, 25.37,
27.19), NoDefect = c(100L, 100L, 100L, 92L, 100L, 100L),
Defect = c(0L, 0L, 0L, 8L, 0L, 0L)), row.names = c(NA, 6L
), class = "data.frame")
代表之间的平均值:
dataAvgSuc <- data %>%
dplyr::group_by(Collection, Irrigation, Variety) %>%
dplyr::summarise(meanSuc=mean(Suc))
使“收藏”成为一个因素:
dataAvgSuc$Collection <- as.factor(dataAvgSuc$Collection)
按品种子集:
subLamoka <- subset(dataAvgSuc, Variety=="Lamoka")
subHodag <- subset(dataAvgSuc, Variety=="Hodag")
subSnowden <- subset(dataAvgSuc, Variety=="Snowden")
尝试了ggplot:
sucPlot <-ggplot(data=subLamoka, aes(x=dataAvgSuc$Collection,
y=meanSuc)) + geom_bar(stat="identity")
错误代码:
Error: Aesthetics must be either length 1 or the same as the data (10):
x, y
但是,当我看x和y时,它们都有30个条目。
答案 0 :(得分:0)
Trev,
重新生成问题时遇到了一些麻烦,因为提供的样本数据仅用于6个观测值,而不是30个观测值。因此,不确定以下解决方案是否对您有用。
我使用您提供的代码创建了数据框:
data <- structure(list(Sample = c(1011L, 1012L, 1014L, 1024L, 1025L, 1026L),
Collection = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1",
"2"), class = "factor"),
Irrigation = structure(c(3L, 3L, 3L,5L, 5L, 5L), .Label = c("Rate1", "Rate2",
"Rate3", "Rate4", "Rate5"
), class = "factor"), Variety = structure(c(2L, 1L, 3L, 3L, 2L,
1L), .Label = c("Hodag", "Lamoka", "Snowden"), class = "factor"),
Suc = c(0.7333, 0.4717, 0.5883, 0.6783, 0.8283, 0.6833),
Gluc = c(0.03, 0.04, 0.043, 0.075, 0.057, 0.087),
L = c(59.48, 57.59, 59.25, 66.45, 68.29, 65.65),
a = c(4.36, 6.85, 3.43, 1.7, 0.78, 2.84),
b = c(26.82, 27.6, 26.2, 26.14, 25.37,27.19),
NoDefect = c(100L, 100L, 100L, 92L, 100L, 100L),
Defect = c(0L, 0L, 0L, 8L, 0L, 0L)),
row.names = c(NA, 6L), class = "data.frame")
data$Collection
但是,您的收集因子定义为两个级别,但示例中仅显示一个。也许这就是为什么平均值大于1的原因?我修改了以下代码,在数据中表示了2个收集级别。
data2 <- structure(list(Sample = c(1011L, 1012L, 1014L, 1024L, 1025L, 1026L),
Collection = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1",
"2"), class = "factor"),
Irrigation = structure(c(3L, 3L, 3L,5L, 5L, 5L), .Label = c("Rate1", "Rate2",
"Rate3", "Rate4", "Rate5"
), class = "factor"), Variety = structure(c(2L, 1L, 3L, 3L, 2L,
1L), .Label = c("Hodag", "Lamoka", "Snowden"), class = "factor"),
Suc = c(0.7333, 0.4717, 0.5883, 0.6783, 0.8283, 0.6833),
Gluc = c(0.03, 0.04, 0.043, 0.075, 0.057, 0.087),
L = c(59.48, 57.59, 59.25, 66.45, 68.29, 65.65),
a = c(4.36, 6.85, 3.43, 1.7, 0.78, 2.84),
b = c(26.82, 27.6, 26.2, 26.14, 25.37,27.19),
NoDefect = c(100L, 100L, 100L, 92L, 100L, 100L),
Defect = c(0L, 0L, 0L, 8L, 0L, 0L)),
row.names = c(NA, 6L), class = "data.frame")
data2$Collection
由于您使用的是dplyr,因此只需将该对象传递到ggplot中-我认为您无需创建新数据框的子集,而可以使用 facet_wrap 分别对它们进行图形化处理命令。我还使用了 geom_col 而不是geom_bar,后者通常试图对计数数据进行图形化处理。由于要绘制平均值,geom_col可能更好。同样,由于下面的示例将管道传递到下一行,因此不需要ggplot命令中通常使用的“ data =”定义。
首先添加数据:
data %>%
dplyr::group_by(Collection,Irrigation, Variety) %>%
dplyr::summarise(meanSuc=mean(Suc)) %>%
ggplot(aes(x = Collection, y = meanSuc)) +
geom_col() +
facet_wrap(.~Variety)
结合灌溉:
data %>%
dplyr::group_by(Collection,Irrigation, Variety) %>%
dplyr::summarise(meanSuc=mean(Suc)) %>%
ggplot(aes(x = Collection, y = meanSuc, fill = Irrigation)) +
geom_col() +
facet_wrap(.~Variety)
并且如上所定义,使用data2将在图形上并排生成Collection级别1和2。使用这种方法,我可以生成结果,并且所有平均值均小于1.,介于.4〜.8
之间