通过数字重新排序因子变量时的问题

时间:2014-05-02 20:41:46

标签: r ggplot2

require(ggplot2)

数据:这是由鲨鱼物种分组的鲨鱼事件。它实际上是一个真实的数据集,已经汇总了。

D <- structure(list(FL_FATAL = structure(c(2L, 2L, 2L, 1L, 2L, 2L, 
                                           2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                           1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L), .Label = c("FATAL", 
                                                                                                           "NO FATAL"), class = "factor"), spec = structure(c(26L, 24L, 
                                                                                                                                                              6L, 26L, 25L, 16L, 2L, 11L, 27L, 5L, 24L, 29L, 12L, 21L, 13L, 
                                                                                                                                                              15L, 28L, 1L, 17L, 19L, 8L, 3L, 6L, 13L, 22L, 18L, 27L, 14L, 
                                                                                                                                                              23L, 20L, 7L, 4L, 8L, 9L, 10L), .Label = c("blacknose", "blacktip", 
                                                                                                                                                                                                         "blue", "bonnethead", "bronze", "bull", "caribbean", "draughtsboard", 
                                                                                                                                                                                                         "dusky", "galapagos", "ganges", "hammerhead", "involve", "leon", 
                                                                                                                                                                                                         "mako", "nurse", "porbeagle", "recovered", "reef", "sand", "sandtiger", 
                                                                                                                                                                                                         "sevengill", "spinner", "tiger", "unconfired", "white", "whitespotted", 
                                                                                                                                                                                                         "whitetip", "wobbegong"), class = "factor"), N = c(368L, 169L, 
                                                                                                                                                                                                                                                            120L, 107L, 78L, 77L, 68L, 59L, 56L, 53L, 46L, 42L, 35L, 35L, 
                                                                                                                                                                                                                                                            33L, 30L, 29L, 29L, 26L, 25L, 25L, 25L, 24L, 24L, 21L, 21L, 20L, 
                                                                                                                                                                                                                                                            20L, 17L, 16L, 16L, 15L, 11L, 11L, 11L)), .Names = c("FL_FATAL", 
                                                                                                                                                                                                                                                                                                                 "spec", "N"), row.names = c(NA, -35L), class = "data.frame")

head(D)
#   FL_FATAL       spec   N   Especies
# 1 NO FATAL      white 368      white
# 2 NO FATAL      tiger 169      tiger
# 3 NO FATAL       bull 120       bull
# 4    FATAL      white 107      white
# 5 NO FATAL unconfired  78 unconfired
# 6 NO FATAL      nurse  77      nurse

通过数字制作新变量重新排序因子变量。

# Re-order spec creating Especies variable ordered by D$N
D$Especies <- factor(D$spec, levels = unique(D[order(D$N), "spec"]))

# This two plots work as spected
ggplot(D, aes(x=N, y=Especies)) + 
  geom_point(aes(size = N, color = FL_FATAL))

ggplot(D, aes(x=N, y=Especies)) + 
  geom_point(aes(size = N, color = FL_FATAL)) +
  facet_grid(. ~ FL_FATAL)

使用reorder()重新排序

# Using reorder isn't working or am i missing something?
ggplot(D, aes(x=N, y=reorder(D$spec, D$N))) + 
  geom_point(aes(size = N, color = FL_FATAL))

# adding facets makes it worse
ggplot(D, aes(x=N, y=reorder(D$spec, D$N))) + 
  geom_point(aes(size = N, color = FL_FATAL)) +
  facet_grid(. ~ FL_FATAL)

使用reorder()生成图的正确方法是什么?

2 个答案:

答案 0 :(得分:2)

我很高兴你喜欢你的第一种方式 - 这是一个很快乐的巧合。您的大多数物种都有一个N值(仅限NO_FATAL),但您有一些同时具有FATAL和NO_FATAL。每当有两个以上的数字行对应一个因子时,reorder使用这些数字的函数来进行最终排序。默认函数为mean,但您可能需要sum,以事件总数排序。

D$spec_order <- reorder(D$spec, D$N, sum)
ggplot(D, aes(x=N, y=spec_order)) + 
  geom_point(aes(size = N, color = FL_FATAL))

ggplot(D, aes(x=N, y=spec_order)) + 
  geom_point(aes(size = N, color = FL_FATAL)) +
  facet_grid(. ~ FL_FATAL)

答案 1 :(得分:2)

问题在于,在重新排序调用中使用D$,您将独立于数据框重新排序spec,因此这些值不再与相应的x值匹配。您需要直接在变量上使用它:

ggplot(D, aes(x=N, y=reorder(spec, N, sum))) + 
  geom_point(aes(size = N, color = FL_FATAL)) +
  facet_grid(. ~ FL_FATAL)