ggplot:在刻面条形图中的weighted.mean和stat_summary

时间:2018-04-29 15:23:13

标签: r ggplot2 bar-chart facet weighted-average

我花了太多时间试图找出将weighted.mean(或wtd.mean)加入stat_summary并使其正常工作的解决方案。 我已经看了好几页试图解决同样的问题,但没有一个有明确的解决方案。 主要的问题是,在stat_summary中,一旦放置了权重,它无法找到它的权重组件,这显然不能从ggplot和/或stat_summary美学中传递下来(相信我,我试过;参见示例)。 现在,我尝试了各种方法,我甚至使用基于ddplyr的函数生成了一个加权平均值的条形图(如另一页所示),但除了有点笨拙之外,它不允许使用facetting,因为它改变了源代码数据帧。

以下是针对此问题构建的数据框架。

elements <- c("water","water","water","water","water","water","air","air","air","air","air","air","earth","earth","earth","earth","earth","earth","fire","fire","fire","fire","fire","fire","aether","aether","aether","aether","aether","aether")
shapes <- c("icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","octahedron","octahedron","octahedron","octahedron","octahedron","octahedron","cube","cube","cube","cube","cube","cube","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron")
greek_letter <- c("alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta")
existence <- c("real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","not real","not real","not real","not real","not real","not real")
value <- c(0,0,0,5,7,0,0,1,0,20,3,0,0,2,2,1,8,0,0,8,10,4,2,0,0,0,0,1,1,0)
importance <- c(20,20,20,20,20,20,10,10,10,10,10,10,3,3,3,3,3,3,9,9,9,9,9,9,50,50,50,50,50,50)
platonic <- data.frame(elements,shapes,greek_letter,existence,value,importance)

(注意:即使我不使用它,我也添加了#34;形状&#34;列,只是为了提醒我,我不想丢失任何数据。过程但最后需要提供。)

原始设置是一个ggplot,只是&#34;意思是&#34;其中包括facetting,如:

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)

以下是相应的代码,但是&#34; weighted.mean&#34; - &GT; &#34; w&#34; aestethics被忽略,因此它假设所有权重相等(通过weighted.mean函数定义),这导致一个简单的均值

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value, w=platonic$importance), fun.y = "weighted.mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

如您所见,它会发出警告 警告:忽略未知的美学:w

我尝试了几种方法来实现它&#34;参见&#34;重量变量但没有成功。最后,我意识到最有希望的方法是重新定义weight.mean函数,使其默认为&#34; w&#34;将是&#34; x&#34;的功能。 Weighted.mean仍然看不到任何&#34; w&#34;美学,但它会计算一个默认。为了实现这一点,我尝试将本机函数(weighted.mean)嵌套到泛型函数中,这允许我更改参数。

一步一步。

首先我试着用#34;意思是&#34; (它有效)。

mean.modif <- function(x) {
  mean(x)
}

ggplot(data = platonic)+
      stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

然后用weighted.mean

   weighted.mean.modif <- function(x,w) {
      weighted.mean(x,w)
    }

 ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

但它仍然没有读到&#34; w&#34; (因为没有&#34; w&#34;指定)所以它给出了正常的平均值。

然后我试着指定&#34; w&#34;参数作为数据框中的权重列

weighted.mean.modif1 <- function(x,w=platonic$importance) {
  weighted.mean(x,w)
}

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

但它不起作用。一条警告信息说: stat_summary()中的计算失败: &#39; X&#39;和&#39; w&#39;必须具有相同的长度

被卡住后,我试图生成一系列随机数字,但长度与&#34; x&#34;相同。它令人惊讶地工作。

weighted.mean.modif2 <- function(x,w=runif(x, min = 0, max = 100)) {
  weighted.mean(x,w)
}
ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "weighted.mean.modif2", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

显然,有一种方法可以欺骗它,但如果我只能使用随机权重则无用。

我试图打印&#34; x&#34;在函数中,然后应用它,虽然它产生了一些东西,甚至&#34;意味着&#34;不再适当了。

mean.modif3 <- function(x) {
  mean(x)
  print(x)
}

所以,我无法弄清楚的棘手部分是如何恰当地联系&#34; w&#34;默认为&#34; x&#34;所以当在stat_summary中调用weighted.mean时,不读取&#34; w&#34;,无论如何使用正确的权重。

正如我所提到的,还有一个 ddply解决方法来获得加权平均值 - 因为它基于创建一个新的源数据帧,只包含已经组织的变量和加权平均值,但它不允许刻面!!!

weighted.fictious <- function(xxxx, yyyy) {
  ddply(xxxx, .(yyyy), function(m) data.frame(fictious_weightedmean=weighted.mean(m$value, m$importance, na.rm = FALSE)))
}

ggplot(data = weighted.fictious(xxxx = platonic, yyyy = platonic$greek_letter), aes(x=yyyy, y=fictious_weightedmean))+
  geom_bar(stat = "identity")

谢谢!

1 个答案:

答案 0 :(得分:2)

ggplot的内置汇总函数并不总是有用,而且大多数情况下,您最好在单独的步骤中计算摘要,然后绘制它。

您的基本示例图实际上是不正确的。它表示“以太”分别具有δ和ε分别为5和7的平均值,这在原始数据中显然不是这种情况(这两个值均为1)。但那些数据框中第一个元素的值(“水”)。之所以出现错误,是因为ggplot按字母顺序构建其构面,同时,您传入原始向量(platonic$value,而不仅仅是value),这会导致事物被绘制在错误的立场。在处理ggplot时,您应该始终传递原始的,未加引号的列名,以便ggplot可以弄清楚如何使用关联数据。

基本情节的正确版本是:

g <- ggplot(data = platonic)+
  stat_summary(mapping = aes(x=greek_letter, y=value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)
print(g)

enter image description here

至于使用weighted.mean,如上所述,这里唯一合理的做法是单独计算,并绘制结果:

platonic.weighted <- platonic %>% 
  group_by(elements, existence, greek_letter) %>% 
  summarize(value = weighted.mean(value, weights = importance))

由于结果数据框仍具有第一个图中使用的所有列名,因此您只需交换新数据集:

g.weighted <- g %+% platonic.weighted

通过这个例子,这两个图是相同的,但你的里程可能会有所不同。

你的问题对于你的预期最终结果有点不清楚,但从给出的例子来看,我假设你想要每个希腊字母的加权平均值。我们可以使用summarize轻松完成此操作,或者如果您真的想要,可以使用mutate来插入一列权重,而不会丢失原始数据:

platonic.weighted <- platonic %>% 
  group_by(greek_letter) %>% 
  mutate(weighted.letter = weighted.mean(value, weights = importance))