面积图:无法以正确的顺序堆叠 - 图例与数据不同步

时间:2014-10-02 14:06:04

标签: r ggplot2 rcharts

对R来说很新,并且不是一位经验丰富的程序员。我在ggplot中遇到问题,使用geom_area为风向创建堆积图。我想确保按照N, NE, E, SE, S, SW, W, NW

的顺序从下到上进行堆叠

我已经成功获得了标签,但问题是颜色不再与图表上的数据相关。以下是我尝试过的各种内容以及生成的图表。

data.frame来自不同的程序,但是一个小的子集如下3天: 最后一栏是我找到的解决方案,但非常笨重,但更令我担忧的是标签不再与ggplot中的数据有关,我想知道我哪里出错了。

我的data.frame如下所示,名为knime.in

         Day of year WD Binned Count(Time) WD Binned Number
    Row0          119         E         324                3
    Row1          119         N          32                1
    Row2          119        NE         240                2
    Row3          119        NW         149                8
    Row4          119         S          65                5
    Row5          119        SE          94                4
    Row6          119        SW         209                6
    Row7          119         W         279                7
    Row8          120         E         435                3
    Row9          120         N          68                1
    Row10         120        NE         112                2
    Row11         120        NW          46                8
    Row12         120         S          15                5
    Row13         120        SE         130                4
    Row14         120        SW          52                6
    Row15         120         W         588                7
    Row16         121         E         114                3
    Row17         121         N          34                1
    Row18         121        NE           6                2
    Row19         121        NW         282                8
    Row20         121         S          55                5
    Row21         121        SE         101                4
    Row22         121        SW         194                6
    Row23         121         W         594                7

首次尝试使用因子:

require (ggplot2)

knime.in$"WD Binned" <- factor(knime.in$"WD Binned", levels = c("N","NE","E","SE","S","SW","W","NW"))

ggplot(knime.in, aes(x = knime.in$"Day of year", y = (knime.in$"Count(Time)"-1), fill = knime.in$"WD Binned")) +  geom_area(stat="identity")+ scale_fill_brewer(palette="BrBG")

第二次尝试使用级别:

require (ggplot2)

levels(knime.in$"WD Binned") <- c("N","NE","E","SE","S","SW","W","NW")

ggplot(knime.in, aes(x = knime.in$"Day of year", y = (knime.in$"Count(Time)"-1), fill = knime.in$"WD Binned")) +  geom_area(stat="identity")+ scale_fill_brewer(palette="BrBG")

无任何参考:

require (ggplot2)

ggplot(knime.in, aes(x = knime.in$"Day of year", y = (knime.in$"Count(Time)"-1), fill = knime.in$"WD Binned")) +  geom_area(stat="identity")+ scale_fill_brewer(palette="BrBG")

最后工作的kludge,通过在数字列上排序我必须在别处创建(因为我无法按照用户定义的顺序进行排序)。

require (ggplot2)

dt <- knime.in[order(knime.in$"WD Binned Number"),] #order the data so that it will be stacked correctly

dt$"WD Binned" <- factor(dt$"WD Binned", levels = c("N","NE","E","SE","S","SW","W","NW")) ggplot(dt, aes(x = dt$"Day of year", y = (dt$"Count(Time)"-1)/1440, fill = dt$"WD Binned")) + geom_area(stat="identity")+ scale_fill_brewer(palette="BrBG")

以第120天为例。根据我们应该得到的数据:

N  = 68
NE = 112
E  = 435
SE = 130
S  = 15
SW = 52
W  = 588
NW = 46

如果我们查看图表:

enter image description here 尝试1 =图表文本标签的顺序正确,以“按字母顺序”顺序堆叠,颜色与标签相关(因此这里只发出堆叠不符合我想要的顺序)

enter image description here 尝试2 =图表文本标签的顺序正确,堆叠 以“按字母顺序”的顺序与REAL数据相关但是颜色以正确的顺序堆叠,但数据与颜色有关,例如N在图例上是深棕色,但图中的深棕色实际上是东部的数据

enter image description here 尝试3(上图)=数据和颜色和标签全部同步但不按我想要的顺序

enter image description here 最终工作(上图)=正如我一直想要的那样,从底部的N堆叠,图例的颜色和图例的标签与图表上的正确数据元素相关

非常感谢

彼得

1 个答案:

答案 0 :(得分:2)

正如@Henrik所说,你应该适当地命名你的变量。您可以按如下方式解决此问题:

# reading the data (with appropriately named variables)
knime.in <- structure(list(Day.of.year = c(119L, 119L, 119L, 119L, 119L, 119L, 119L, 119L, 120L, 120L, 120L, 120L, 120L, 120L, 120L, 120L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L),
                           WD.Binned = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), .Label = c("E", "N", "NE", "NW", "S", "SE", "SW", "W"), class = "factor"),
                           Count = c(324L, 32L, 240L, 149L, 65L, 94L, 209L, 279L, 435L, 68L, 112L, 46L, 15L, 130L, 52L, 588L, 114L, 34L, 6L, 282L, 55L, 101L, 194L, 594L)), .Names = c("Day.of.year", "WD.Binned", "Count"),
                      class = "data.frame", row.names = c(NA, -24L))

# rearranging the factor levels
knime.in$WD.Binned <- factor(knime.in$WD.Binned, levels = c("N","NE","E","SE","S","SW","W","NW"))

# loading required packages
library(ggplot2)
library(dplyr)

# rearranging the data with dplyr
knime.in <- knime.in %>% group_by(Day.of.year) %>% arrange(WD.Binned)

# rearranging the data in base R
knime.in <- knime.in[order(knime.in$WD.Binned),]

# creating the area plot    
ggplot(knime.in, aes(x = Day.of.year, y = (Count-1), fill = WD.Binned)) +
  geom_area(stat="identity") + 
  scale_x_continuous("\nDay of the year", expand=c(0,0), breaks=c(119,120,121)) +
  scale_y_continuous("Count", expand=c(0,0), breaks=c(250,500,750,1000,1250)) +
  scale_fill_brewer(palette="BrBG") +
  theme_classic()

给出: enter image description here


回答comment

当您使用knime.in <- structure(...code...)读取数据并绘图时,您会得到以下结果: enter image description here

现在,请查看WD.Binnedlevels(knime.in$WD.Binned)的级别。如您所见,它们与图例的顺序相同。现在,还要查看您的数据框(使用View(knime.in)),您将看到行的顺序也与图例相同。这应该不会让您感到惊讶,因为水平按照它们在数据集中出现的顺序显示。

使用knime.in$WD.Binned <- factor(knime.in$WD.Binned, levels=c("N","NE","E","SE","S","SW","W","NW"))更改级别的顺序时,只会更改级别的顺序,但不会更改数据的顺序。然后,当您创建绘图时,您会看到数据按照它在数据框中的存储顺序绘制: enter image description here

因此,您还必须重新排序数据。这可以通过以下方式完成:knime.in <- knime.in[order(knime.in$WD.Binned),](或等效的dplyr)。现在你可以得到正确顺序绘制水平的情节,正如我在这个答案的第一个图中所示。