Question

我正在尝试平均数据重复次数，对一种治疗进行子集处理，然后对响应和另一因素进行条形图绘制。我的情节最终无法正常工作。任何帮助将不胜感激。

我的数据：

data <- structure(list(Sample = c(1011L, 1012L, 1014L, 1024L, 1025L, 
1026L), Collection = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2"), class = "factor"), Irrigation = structure(c(3L, 3L, 3L, 
5L, 5L, 5L), .Label = c("Rate1", "Rate2", "Rate3", "Rate4", "Rate5"
), class = "factor"), Variety = structure(c(2L, 1L, 3L, 3L, 2L, 
1L), .Label = c("Hodag", "Lamoka", "Snowden"), class = "factor"), 
Suc = c(0.7333, 0.4717, 0.5883, 0.6783, 0.8283, 0.6833), 
Gluc = c(0.03, 0.04, 0.043, 0.075, 0.057, 0.087), L = c(59.48, 
57.59, 59.25, 66.45, 68.29, 65.65), a = c(4.36, 6.85, 3.43, 
1.7, 0.78, 2.84), b = c(26.82, 27.6, 26.2, 26.14, 25.37, 
27.19), NoDefect = c(100L, 100L, 100L, 92L, 100L, 100L), 
Defect = c(0L, 0L, 0L, 8L, 0L, 0L)), row.names = c(NA, 6L
), class = "data.frame")

代表之间的平均值：

dataAvgSuc <- data %>%
  dplyr::group_by(Collection, Irrigation, Variety) %>%
  dplyr::summarise(meanSuc=mean(Suc))

使“收藏”成为一个因素：

dataAvgSuc$Collection <- as.factor(dataAvgSuc$Collection)

按品种子集：

subLamoka <- subset(dataAvgSuc, Variety=="Lamoka")
subHodag <- subset(dataAvgSuc, Variety=="Hodag")
subSnowden <- subset(dataAvgSuc, Variety=="Snowden")

尝试了ggplot：

sucPlot <-ggplot(data=subLamoka, aes(x=dataAvgSuc$Collection, 
y=meanSuc)) + geom_bar(stat="identity")

错误代码：

Error: Aesthetics must be either length 1 or the same as the data (10): 
x, y

但是，当我看x和y时，它们都有30个条目。

Answer 1

Trev，

重新生成问题时遇到了一些麻烦，因为提供的样本数据仅用于6个观测值，而不是30个观测值。因此，不确定以下解决方案是否对您有用。

我使用您提供的代码创建了数据框：

data <- structure(list(Sample = c(1011L, 1012L, 1014L, 1024L, 1025L, 1026L), 
        Collection = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1",                                                                                               
        "2"), class = "factor"), 
        Irrigation = structure(c(3L, 3L, 3L,5L, 5L, 5L), .Label = c("Rate1", "Rate2", 
        "Rate3", "Rate4", "Rate5"                                                                                                
         ), class = "factor"), Variety = structure(c(2L, 1L, 3L, 3L, 2L,                                                                                                                                               
        1L), .Label = c("Hodag", "Lamoka", "Snowden"), class = "factor"), 
                   Suc = c(0.7333, 0.4717, 0.5883, 0.6783, 0.8283, 0.6833), 
                   Gluc = c(0.03, 0.04, 0.043, 0.075, 0.057, 0.087),
                   L = c(59.48, 57.59, 59.25, 66.45, 68.29, 65.65), 
                   a = c(4.36, 6.85, 3.43, 1.7, 0.78, 2.84),
                   b = c(26.82, 27.6, 26.2, 26.14, 25.37,27.19),
                   NoDefect = c(100L, 100L, 100L, 92L, 100L, 100L), 
                   Defect = c(0L, 0L, 0L, 8L, 0L, 0L)), 
       row.names = c(NA, 6L), class = "data.frame")


data$Collection

但是，您的收集因子定义为两个级别，但示例中仅显示一个。也许这就是为什么平均值大于1的原因？我修改了以下代码，在数据中表示了2个收集级别。

data2 <- structure(list(Sample = c(1011L, 1012L, 1014L, 1024L, 1025L, 1026L), 
        Collection = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1",                                                                                               
        "2"), class = "factor"), 
        Irrigation = structure(c(3L, 3L, 3L,5L, 5L, 5L), .Label = c("Rate1", "Rate2", 
        "Rate3", "Rate4", "Rate5"                                                                                                
         ), class = "factor"), Variety = structure(c(2L, 1L, 3L, 3L, 2L,                                                                                                                                               
        1L), .Label = c("Hodag", "Lamoka", "Snowden"), class = "factor"), 
                   Suc = c(0.7333, 0.4717, 0.5883, 0.6783, 0.8283, 0.6833), 
                   Gluc = c(0.03, 0.04, 0.043, 0.075, 0.057, 0.087),
                   L = c(59.48, 57.59, 59.25, 66.45, 68.29, 65.65), 
                   a = c(4.36, 6.85, 3.43, 1.7, 0.78, 2.84),
                   b = c(26.82, 27.6, 26.2, 26.14, 25.37,27.19),
                   NoDefect = c(100L, 100L, 100L, 92L, 100L, 100L), 
                   Defect = c(0L, 0L, 0L, 8L, 0L, 0L)), 
       row.names = c(NA, 6L), class = "data.frame")


data2$Collection

由于您使用的是dplyr，因此只需将该对象传递到ggplot中-我认为您无需创建新数据框的子集，而可以使用 facet_wrap 分别对它们进行图形化处理命令。我还使用了 geom_col 而不是geom_bar，后者通常试图对计数数据进行图形化处理。由于要绘制平均值，geom_col可能更好。同样，由于下面的示例将管道传递到下一行，因此不需要ggplot命令中通常使用的“ data =”定义。

首先添加数据：

data %>%
        dplyr::group_by(Collection,Irrigation, Variety) %>%
        dplyr::summarise(meanSuc=mean(Suc)) %>% 
          ggplot(aes(x = Collection, y = meanSuc)) +
            geom_col() +
             facet_wrap(.~Variety)

结合灌溉：

data %>%
        dplyr::group_by(Collection,Irrigation, Variety) %>%
        dplyr::summarise(meanSuc=mean(Suc)) %>% 
           ggplot(aes(x = Collection, y = meanSuc, fill = Irrigation)) +
             geom_col() +
              facet_wrap(.~Variety)

并且如上所定义，使用data2将在图形上并排生成Collection级别1和2。使用这种方法，我可以生成结果，并且所有平均值均小于1.，介于.4〜.8

之间

为什么我的ggplot2美学长度错误？

1 个答案: