通过轴组合两个图形

时间:2018-04-10 18:56:17

标签: r ggplot2

我正在研究数据集并代表变量。我尝试使用此数据集https://archive.ics.uci.edu/ml/datasets/automobile。我想代表city-mpghighway-mpg vs num-of-cylinders。我的代码在R

library(ggplot2)
data <- read.csv('http://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data', header=F, sep = "," ,dec = ".",
                 colClasses = c('factor','numeric','factor','factor','factor','factor','factor','factor','factor',
                                'numeric','numeric','numeric','numeric','numeric','factor','factor','numeric',
                                'factor','numeric','numeric','numeric','numeric','numeric','numeric',
                                'numeric','numeric'), na.strings = "?")

colnames(data) <- c("symboling", "normalized-losses","make","fuel-type","aspiration",
                 "num-of-doors","body-style","drive-wheels","engine-location","wheel-base","length",
                 "width","height","curb-weight","engine-type","num-of-cylinders","engine-size","fuel-system",
                 "bore","stroke","compression-ratio","horsepower","peak-rpm","city-mpg","highway-mpg","price")
summary(data)

data$`num-of-cylinders` <- as.character(data$`num-of-cylinders`)
data$`num-of-cylinders`[which(data$`num-of-cylinders` == "two")] <- "2"
data$`num-of-cylinders`[which(data$`num-of-cylinders` == "three")] <- "3"
data$`num-of-cylinders`[which(data$`num-of-cylinders` == "four")] <- "4"
data$`num-of-cylinders`[which(data$`num-of-cylinders` == "five")] <- "5"
data$`num-of-cylinders`[which(data$`num-of-cylinders` == "six")] <- "6"
data$`num-of-cylinders`[which(data$`num-of-cylinders` == "eight")] <- "8"
data$`num-of-cylinders`[which(data$`num-of-cylinders` == "twelve")] <- "12"
data$`num-of-cylinders` <- as.numeric(data$`num-of-cylinders`)
data$`num-of-cylinders` <- as.factor(data$`num-of-cylinders`)

ggplot(data = data, aes(x = `num-of-cylinders`, y = `city-mpg`)) +
      geom_boxplot() +
      xlab('Number of Cylinders') +
      ylab('MPG') +
      ggtitle('MPG Comparison by Number of Cylinders') 

ggplot(data = data, aes(x = `num-of-cylinders`, y = `highway-mpg`)) +
      geom_boxplot() +
      xlab('Number of Cylinders') +
      ylab('MPG') +
      ggtitle('MPG Comparison by Number of Cylinders') 

我可以单独表示箱线图,但有一种方法可以使用相同的y轴(city-mpghighway-mpg)?

2 个答案:

答案 0 :(得分:2)

我首先提出建议:使用下划线(_)代替连字符( - )来提供列名称,因为您可以拨打data$city-mpg,但不能拨打ggplot

其次,data_long <- tidyr::gather(data, key = measure, value = mpg, `city-mpg`, `highway-mpg`) ggplot(data_long, aes(x = `num-of-cylinders`, y = mpg, color = measure)) + geom_boxplot() + xlab('Number of Cylinders') + ylab('MPG') + ggtitle('MPG Comparison by Number of Cylinders and Road Type') 通常期望数据采用长格式,而不是宽格式。想想你正在尝试做什么:将mpg与气缸数量进行比较,按条件(城市与高速公路)分组。将其重塑为长格式,并将城市与高速公路视为一个变量,您可以在其上映射颜色或拆分为方面。

您可以添加到代码中的两个选项如下:一个使用颜色,一个使用facet。

ggplot(data_long, aes(x = `num-of-cylinders`, y = mpg)) +
    geom_boxplot() +
    xlab('Number of Cylinders') +
    ylab('MPG') +
    ggtitle('MPG Comparison by Number of Cylinders and Road Type') +
    facet_wrap(~measure)

{{1}}

reprex package(v0.2.0)创建于2018-04-10。

答案 1 :(得分:1)

与大多数分析过程一样,ggplot最适合长数据。只需将 city_mpg 子集叠加在 highway_mpg 子集的顶部,并使用城市和高速公路的指标。下面使用下划线作为列名而不是连字符。

# RBIND TWO DATAFRAME SUBSETS (RENAMING W/ setNames AND ADDING NEW COLUMN W/ transform)
long_data <- rbind(transform(setNames(data[c("num_of_cylinders", "city_mpg")],
                                      c("num_of_cylinders", "mpg")), mile_type = "city"),
                   transform(setNames(data[c("num_of_cylinders", "highway_mpg")],
                                      c("num_of_cylinders", "mpg")), mile_type = "highway"))

# PLOT LONG DATA
ggplot(data = long_data, aes(x = num_of_cylinders, y = mpg, colour=mile_type)) +
  geom_boxplot() +
  xlab('Number of Cylinders') +
  ylab('MPG') +
  ggtitle('MPG Comparison by Number of Cylinders') 

Plot Output