如何根据一个变量订购图表

时间:2017-12-31 00:15:32

标签: r ggplot2

我正在创建一个dotplot

fat <- (airlines$fat1 + airlines$fat2)
ggplot(airlines, aes(y = airline, x = fat)) + geom_point(stat = "identity")

但结果非常混乱,所以我想按照fat变量按升序排序。

我试过了:

airlines1 <- data.frame(airline = rownames(airlines), fat, row.names = NULL)
airlines2 <- factor(airlines1$airline, 
                    levels = airlines[order(airlines$fat),"fat"])
ggplot(airlines2, aes(y = airline, x = fat)) + 
  geom_point(stat = "identity")

但我得到两个错误:

"Error: Column `fat` not found" 

&安培;

"Error: ggplot2 doesn't know how to deal with data of class factor"

我应该怎么订购?

这是我正在分析的数据:

dput(airlines)
structure(list(airline = c("Aer Lingus", "Aeroflot*", "Aerolineas Argentinas", 
"Aeromexico*", "Air Canada", "Air France", "Air India*", "Air New Zealand*", 
"Alaska Airlines*", "Alitalia", "All Nippon Airways", "American*", 
"Austrian Airlines", "Avianca", "British Airways*", "Cathay Pacific*", 
"China Airlines", "Condor", "COPA", "Delta / Northwest*", "Egyptair", 
"El Al", "Ethiopian Airlines", "Finnair", "Garuda Indonesia", 
"Gulf Air", "Hawaiian Airlines", "Iberia", "Japan Airlines", 
"Kenya Airways", "KLM*", "Korean Air", "LAN Airlines", "Lufthansa*", 
"Malaysia Airlines", "Pakistan International", "Philippine Airlines", 
"Qantas*", "Royal Air Maroc", "SAS*", "Saudi Arabian", "Singapore Airlines", 
"South African", "Southwest Airlines", "Sri Lankan / AirLanka", 
"SWISS*", "TACA", "TAM", "TAP - Air Portugal", "Thai Airways", 
"Turkish Airlines", "United / Continental*", "US Airways / America West*", 
"Vietnam Airlines", "Virgin Atlantic", "Xiamen Airlines"), avseatkm = c(320906734, 
1197672318, 385803648, 596871813, 1865253802, 3004002661, 869253552, 
710174817, 965346773, 698012498, 1841234177, 5228357340, 358239823, 
396922563, 3179760952, 2582459303, 813216487, 417982610, 550491507, 
6525658894, 557699891, 335448023, 488560643, 506464950, 613356665, 
301379762, 493877795, 1173203126, 1574217531, 277414794, 1874561773, 
1734522605, 1001965891, 3426529504, 1039171244, 348563137, 413007158, 
1917428984, 295705339, 682971852, 859673901, 2376857805, 651502442, 
3276525770, 325582976, 792601299, 259373346, 1509195646, 619130754, 
1702802250, 1946098294, 7139291291, 2455687887, 625084918, 1005248585, 
430462962), inc1 = c(2, 76, 6, 3, 2, 14, 2, 3, 5, 7, 3, 21, 1, 
5, 4, 0, 12, 2, 3, 24, 8, 1, 25, 1, 10, 1, 0, 4, 3, 2, 7, 12, 
3, 6, 3, 8, 7, 1, 5, 5, 7, 2, 2, 1, 2, 2, 3, 8, 0, 8, 8, 19, 
16, 7, 1, 9), fatacc1 = c(0, 14, 0, 1, 0, 4, 1, 0, 0, 2, 1, 5, 
0, 3, 0, 0, 6, 1, 1, 12, 3, 1, 5, 0, 3, 0, 0, 1, 1, 0, 1, 5, 
2, 1, 1, 3, 4, 0, 3, 0, 2, 2, 1, 0, 1, 1, 1, 3, 0, 4, 3, 8, 7, 
3, 0, 1), fat1 = c(0, 128, 0, 64, 0, 79, 329, 0, 0, 50, 1, 101, 
0, 323, 0, 0, 535, 16, 47, 407, 282, 4, 167, 0, 260, 0, 0, 148, 
520, 0, 3, 425, 21, 2, 34, 234, 74, 0, 51, 0, 313, 6, 159, 0, 
14, 229, 3, 98, 0, 308, 64, 319, 224, 171, 0, 82), inc2 = c(0, 
6, 1, 5, 2, 6, 4, 5, 5, 4, 7, 17, 1, 0, 6, 2, 2, 0, 0, 24, 4, 
1, 5, 0, 4, 3, 1, 5, 0, 2, 1, 1, 0, 3, 3, 10, 2, 5, 3, 6, 11, 
2, 1, 8, 4, 3, 1, 7, 0, 2, 8, 14, 11, 1, 0, 2), fatacc2 = c(0, 
1, 0, 0, 0, 2, 1, 1, 1, 0, 0, 3, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 
2, 0, 2, 1, 0, 0, 0, 2, 0, 0, 0, 0, 2, 2, 1, 0, 0, 1, 0, 1, 0, 
0, 0, 0, 1, 2, 0, 1, 2, 2, 2, 0, 0, 0), fat2 = c(0, 88, 0, 0, 
0, 337, 158, 7, 88, 0, 0, 416, 0, 0, 0, 0, 225, 0, 0, 51, 14, 
0, 92, 0, 22, 143, 0, 0, 0, 283, 0, 0, 0, 0, 537, 46, 1, 0, 0, 
110, 0, 83, 0, 0, 0, 0, 3, 188, 0, 1, 84, 109, 23, 0, 0, 0), 
    model = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", 
    "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
    "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", 
    "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", 
    "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", 
    "51", "52", "53", "54", "55", "56")), .Names = c("airline", 
"avseatkm", "inc1", "fatacc1", "fat1", "inc2", "fatacc2", "fat2", 
"model"), row.names = c(NA, -56L), class = c("tbl_df", "tbl", 
"data.frame"))

enter image description here

2 个答案:

答案 0 :(得分:2)

dplyrggplot2

library(ggplot2)
library(dplyr)

airlines %>% 
  select(airline, fat1, fat2) %>% 
  mutate(fat = fat1 + fat2) %>% 
  ggplot(aes(fat, reorder(airline, fat))) +
  geom_point(stat = "identity") +
  labs(y = "airline", x = "fatalities")

enter image description here

如果您希望订单撤消,可以将fat修改为-fat

airlines %>% 
  select(airline, fat1, fat2) %>% 
  mutate(fat = fat1 + fat2) %>% 
  ggplot(aes(fat, reorder(airline, -fat))) +
  geom_point(stat = "identity") +
  labs(y = "airline", x = "fatalities")

enter image description here

答案 1 :(得分:0)

没有理由加载dplyr或以其他方式操纵基础airlines data.frame。这可以通过动态调用aes() 来简洁地完成

library(ggplot2)
ggplot(airlines, aes(y = reorder(airline, fat1 + fat2), x = fat1 + fat2)) + 
  geom_point() + xlab("Fatalities") + ylab(NULL)

reorder()的调用强制airline通过增加fat1 + fat2的值来计算因子级别的排序位置。

enter image description here

为了处理因素,我发现Hadley Wickham的forcats包非常有用。 fct_reorder()有一个.desc参数,可用于明确反转因子级别的顺序:

ggplot(airlines, aes(y = forcats::fct_reorder(airline, fat1 + fat2, .desc = TRUE), 
                     x = fat1 + fat2)) + 
  geom_point() + xlab("Fatalities") + ylab(NULL)

enter image description here

我个人认为这比将基础R X函数的reorder()参数乘以-1更加透明,例如reorder(airline, -(fat1 + fat2))。但是,您的里程可能会有所不同。