子集数据时,ggplot为什么会忽略我的因子水平?

时间:2018-11-15 23:03:52

标签: r ggplot2 bar-chart

我正在使用从上一个问题的答案中获得的一些代码,但是遇到一个有趣的问题,我希望对正在发生的事情有一些专家见解。我正在尝试使用条形图绘制与年度平均值的每月偏差。具体来说,我会根据月平均值是高于还是低于年平均值来给不同的条涂上不同的颜色。我正在使用Private Sub btnCompare_Click(sender As Object, e As EventArgs) Handles btnCompare.Click Dim rowToDel As Integer Dim x, y, maxx, maxy As Integer maxy = ListView2.Items.Count maxx = ListView1.Items.Count For x = 0 To maxx - 1 ' ListView1 Rows For y = 0 To maxy - 1 ' ListView2 Rows If y < maxy Then If ListView1.Items(x).SubItems(1).Text = ListView2.Items(y).SubItems(1).Text Then rowToDel = y ListView2.Items(y).Remove() ListView2.Refresh() maxy = maxy - 1 ' Reduce ListView2 Max row End If End If Next Next End Sub Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load ' Adding ListView Columns ListView1.Columns.Add("Col1", 60, HorizontalAlignment.Left) ListView1.Columns.Add("Col2", 60, HorizontalAlignment.Left) ListView1.Columns.Add("Col3", 60, HorizontalAlignment.Left) ListView2.Columns.Add("Col1", 60, HorizontalAlignment.Left) ListView2.Columns.Add("Col2", 60, HorizontalAlignment.Left) ListView2.Columns.Add("Col3", 60, HorizontalAlignment.Left) Dim str(3) As String Dim itm As ListViewItem str(0) = "1111" str(1) = "User1" str(2) = "2017" itm = New ListViewItem(str) ListView1.Items.Add(itm) str(0) = "1113" str(1) = "User2" str(2) = "2017" itm = New ListViewItem(str) ListView1.Items.Add(itm) str(0) = "1114" str(1) = "User3" str(2) = "2018" itm = New ListViewItem(str) ListView1.Items.Add(itm) str(0) = "2211" str(1) = "User3" str(2) = "2019" itm = New ListViewItem(str) ListView2.Items.Add(itm) str(0) = "2222" str(1) = "User4" str(2) = "2019" itm = New ListViewItem(str) ListView2.Items.Add(itm) str(0) = "2223" str(1) = "User1" str(2) = "2019" itm = New ListViewItem(str) ListView2.Items.Add(itm) End Sub 数据包中包含的txhousing数据集。

我想我可以用一个因素来表示是否是这种情况。当我只绘制数据的子集(“较低”值,但是当我添加另一个图时,ggplot2将所有月份重新排列为字母顺序,则这些月份是正确排序的。是否有人知道为什么会这样,并且有什么解决方法?

非常感谢您的投入!欢迎批评我的代码:)

可复制的示例

1。仅使用一个情节

ggplot

enter image description here

2。对所有数据使用两个图:

library(tidyverse)

# subset txhousing to just years >= 2011, and calculate nested means and dates
housing_df <- filter(txhousing, year == 2014) %>%
  group_by(year, month) %>%
  summarise(monthly_mean = mean(sales, na.rm = TRUE),
            date = first(date)) %>%
  mutate(month = factor(month.abb[month], levels = month.abb, ordered = TRUE),
         salesdiff = monthly_mean - mean(monthly_mean), # monthly deviation
         higherlower = case_when(salesdiff >= 0 ~ "higher",                                   
                                 salesdiff < 0 ~ "lower"))

ggplot(data = housing_df, aes(x = month, y = salesdiff, higherlower)) +
  geom_col(data = filter(housing_df, higherlower == "higher"), aes(y = salesdiff, fill = higherlower)) +
  scale_fill_manual(values = c("higher" = "blue", "lower" = "red")) +
  theme_bw() +
  theme(legend.position = "none") # remove legend

enter image description here

2 个答案:

答案 0 :(得分:2)

有多种方法可以做到这一点,但我发现它颇为成功。您已经在进行最常见的修复,即将月份转换为一个因子,这就是为什么第一个绘图起作用的原因。为什么在第二种情况下不起作用,这是一个谜,但是尝试添加+ scale_x_discrete(limits= housing_df$month)来覆盖x轴顺序,看看是否可行。

我同意其他评论,最好的方法是在这种特定情况下甚至不需要使用额外的层,但是即使存在多个层,上述解决方案也可以使用。

答案 1 :(得分:1)

此外,+ scale_x_discrete(drop = FALSE)还将覆盖ggplot中来自不同数据源的潜在不同因子水平。

此主题也在此处解决:https://github.com/tidyverse/ggplot2/issues/577