更改geom_histogram中的bin的顺序?

时间:2014-10-29 23:50:34

标签: r ggplot2 bins

我正在使用ggplot2并尝试更改垃圾箱的顺序。我正在使用这里的NY's Stop and Frisk计划的数据:http://www.nyclu.org/content/stop-and-frisk-data

时间以整数形式给出(例如:5 = 12:05 AM,355 = 3:55 AM,2100 = 9 PM)。

我使用以下内容创建了停止时间的直方图

myplot <- ggplot(Stop.and.Frisk.2011) + geom_histogram(aes(x=timestop),binwidth=300)

这给了我一个相当不错的时间图,其中的分箱从午夜凌晨3点,凌晨3点 - 早上6点,早上6点 - 早上9点等等。

然而,我希望将前两个箱子(午夜 - 凌晨3点和早上6点 - 上午9点)移动到最后,模拟更多的正常工作日。

有没有一种简单的方法来改变垃圾箱的顺序?我已经尝试过使用break函数,但找不到让它循环回来的方法。

基本上,我希望垃圾箱按以下顺序排列:600-900,900-1200,1200-1500,1500-1800,1800-2100,2100-2400,0-300,300-600。

提前致谢!

2 个答案:

答案 0 :(得分:0)

一种方法是在调用ggplot之前对数据进行分区。以下是使用cut函数创建3小时间隔的示例:

# Load ggplot2 for plotting
library(ggplot2)

# Read in the data
df <- read.csv('SQF 2012.csv', header = TRUE)

# Create intervals every 3 hours based
# on the `timestop` variable
df$intervals <- cut(df$timestop,
                    breaks = c(0, 300, 600,
                               900, 1200, 1500,
                               1800, 2100, 2400))

# Re-order the sequence prior to plotting
df$sequence <- ifelse(df$intervals == '(600,900]', 1, NA)
df$sequence <- ifelse(df$intervals == '(900,1.2e+03]', 2, df$sequence)
df$sequence <- ifelse(df$intervals == '(1.2e+03,1.5e+03]', 3, df$sequence)
df$sequence <- ifelse(df$intervals == '(1.5e+03,1.8e+03]', 4, df$sequence)
df$sequence <- ifelse(df$intervals == '(1.8e+03,2.1e+03]', 5, df$sequence)
df$sequence <- ifelse(df$intervals == '(2.1e+03,2.4e+03]', 6, df$sequence)
df$sequence <- ifelse(df$intervals == '(0,300]', 7, df$sequence)
df$sequence <- ifelse(df$intervals == '(300,600]', 8, df$sequence)
df$sequence <- as.numeric(df$sequence)

# Create the plot
ggplot(df, aes(x = sequence)) +
  geom_histogram(binwidth = 0.5) +
  scale_x_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7, 8),
                     labels = c('6AM-9AM', '9AM-12PM', '12PM-3PM', '3PM-6PM',
                                '6PM-9PM', '9PM-12AM', '12AM-3AM', '3AM-6AM')) +
  xlab('Time') +
  ylab('Number\n') + 
  theme(axis.text = element_text(size = rel(1.1))) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  theme(axis.title = element_text(size = rel(1.1), face = 'bold'))

Output

答案 1 :(得分:0)

这是一种方法。我将2400添加到0到599之间的所有时间停止值。这样,我将您想要的时间范围移动到图表的末尾(即右侧)。当我绘制图形时,我为你修改了x轴。

library(data.table)
library(dplyr)

# Read the file
foo <- fread("SQF 2012.csv", header = TRUE, na.strings="NA", colClasses="character")

# Change timestop values
ana <- setDF(foo) %>%
       select(datestop,timestop) %>%
       mutate(timestop = as.numeric(timestop), 
              timestop = ifelse(timestop >= 0 & timestop < 600, 2400 + timestop, timestop))

# Draw the graph
ggplot(data = ana, aes(x = timestop)) +
    geom_histogram() +
    scale_x_continuous(limit = c(600, 3000),
                       breaks = c(600, 900, 1200, 1500,
                                  1800, 2100, 2400, 2700, 3000),
                       labels = c("6:00", "9:00", "12:00", "15:00",
                                  "18:00", "21:00", "24:00", "3:00", "6:00")) +
    xlab("Time")

enter image description here