geom_violin会自动忽略变化较小的值

时间:2017-08-29 11:00:01

标签: r ggplot2

我遇到了geom_violin的问题。为了使我的问题可以重现,我创建了一个玩具数据集。

说这是我原来的数据:

require(jsonlite)
data <- fromJSON("[{\"Season\":\"Spring\",\"Maximum.Profit\":2520,\"Hidden\":\"No\"},{\"Season\":\"Spring\",\"Maximum.Profit\":1710,\"Hidden\":\"No\"},{\"Season\":\"Spring\",\"Maximum.Profit\":2500,\"Hidden\":\"No\"},{\"Season\":\"Spring\",\"Maximum.Profit\":2850,\"Hidden\":\"Yes\"},{\"Season\":\"Spring\",\"Maximum.Profit\":3500,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":5740,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":5100,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":1710,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":3500,\"Hidden\":\"Yes\"},{\"Season\":\"Summer\",\"Maximum.Profit\":8000,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":4920,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":720,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":13740,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":2600,\"Hidden\":\"Yes\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":3810,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":-1260,\"Hidden\":\"No\"}]")

以下是我的可视化代码:

require(ggplot2)
p <- ggplot(data, aes(x=Season, y=Maximum.Profit))
p <- p + geom_violin(aes(color=Hidden)) + geom_boxplot(aes(fill=Hidden))

enter image description here

与箱图不同,geom_violin忽略了所有季节中的隐藏 - “是”。我意识到每个案例中只有一个值(Season_Hidden):“Autumn_Yes”,“Spring_Yes”,“Summer_Yes”。所以我为每个增加了一个值。我尽量不创建相同的值,所以我也让它们有点不同。您可以查看data2

底部的3行
data2 <- fromJSON("[{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":2520},{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":1710},{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":2500},{\"Season\":\"Spring\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2850},{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":3500},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":5740},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":5100},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":1710},{\"Season\":\"Summer\",\"Hidden\":\"Yes\",\"Maximum.Profit\":3500},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":8000},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":4920},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":720},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":13740},{\"Season\":\"Autumn\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2600},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":3810},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":-1260},{\"Season\":\"Autumn\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2607.2},{\"Season\":\"Spring\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2857.2},{\"Season\":\"Summer\",\"Hidden\":\"Yes\",\"Maximum.Profit\":3507.2}]")

但是这个data2创建了与data相同的数字。所以我强迫它多吃一点:

p <- ggplot(rbind(data2, data2), aes(x=Season, y=Maximum.Profit))
p <- p + geom_violin(aes(color=Hidden), scale="width", position=position_dodge(width=1))
p <- p + geom_boxplot(aes(fill=Hidden), position=position_dodge(width=1), width=0.2)

geom_boxplotgeom_boxplot的其他设置并不重要。我只是把它放在那里让它更漂亮)

enter image description here

现在这是我想要的图片,但我不想以偷偷摸摸的方式进行,例如在上一个示例中使用rbind(data2, data2)而不是data

有谁知道这个问题的更好,更稳定的解决方案?如何使geom_violin不忽略低方差值,或者至少将一边留空,以便在与其他几何体结合时不会弄乱(在这种情况下为boxplot

0 个答案:

没有答案