如何在geom_histogram中每月有一个bin?

时间:2017-04-10 17:52:31

标签: r ggplot2

我想要一些我的数据的计数直方图。数据的时间间隔不均匀(即可能会有一些日子丢失)。我可以使用

创建直方图
ym_plot <- ggplot(data = df %>% mutate(timestamp = as.POSIXct(timestamp)), aes(timestamp)) + 
            geom_histogram(aes(fill = ..count..))
print(ym_plot)

但是,每年之间有8个箱子,因此箱子不会映射到几个月。有没有一种简单的方法将垃圾箱设置为一个月?如果数据在一年开始时开始,我会12*number_of_months

编辑:

这是一个示例

[1] "2013-07-15 22:12:43 EST"
[1] "2013-05-04 21:30:06 EST"
[1] "2017-01-02 02:28:02 EST"
[1] "2013-02-28 08:06:09 EST"
[1] "2011-11-10 13:57:16 EST"
[1] "2015-11-12 21:05:37 EST"
[1] "2011-10-31 13:02:21 EST"
[1] "2015-01-18 12:22:45 EST"
[1] "2013-02-04 11:57:41 EST"
[1] "2011-10-16 21:54:27 EST"
[1] "2013-06-19 23:11:45 EST"
[1] "2015-08-16 19:26:29 EST"
[1] "2016-11-09 21:48:20 EST"
[1] "2011-06-13 13:30:19 EST"
[1] "2012-05-08 02:50:42 EST"
[1] "2014-10-15 23:27:28 EST"
[1] "2012-03-11 00:56:05 EST"
[1] "2014-07-16 17:32:34 EST"
[1] "2011-08-08 19:01:39 EST"
[1] "2014-08-31 13:41:49 EST"
[1] "2017-03-09 23:23:45 EST"
[1] "2013-02-16 13:27:49 EST"
[1] "2012-08-22 23:58:33 EST"
[1] "2012-04-20 11:06:32 EST"
[1] "2016-01-22 20:50:30 EST"

2 个答案:

答案 0 :(得分:1)

我不清楚您是否要将数据分组为12个分箱,无论您的系列跨越多少年,或者您希望将系列汇总到月度频率,每个日历月分组一个。我将假设后者。所以:

# make some toy data representing an irregular time series, i.e., you have observations
# for some days but not others
set.seed(1)
dates <- sample(seq(from = as.Date("2015-01-01"), to = as.Date("2016-12-31"), by = "day"), 300)
values <- rnorm(300, 10, 2)
df <- data.frame(date = dates, value = values)

# load the packages we'll use. we need 'zoo' for its yearmon function.    
library(dplyr)
library(ggplot2)
library(zoo)


# now...
df %>%
  # use 'as.yearmon' to create a variable identifying the unique year-month
  # combination in which each observation falls
  mutate(yearmon = as.yearmon(date)) %>%
  # use that variable to group the data
  group_by(yearmon) %>%
  # count the number of observations in each of those year-month bins. if you
  # want to summarise the data some other way, use 'summarise' here instead.
  tally() %>%
  # plot the resulting series with yearmon on the x-axis and using 'geom_col'
  # instead of 'geom_hist' to preserve the temporal ordering and avoid
  # having to specify stat = "identity"
  ggplot(aes(x = yearmon, y = n)) + geom_col()

结果:

enter image description here

如果您只需要12个分箱,无论您的数据跨越多少年,您都可以使用month包中的lubridate功能创建分组变量,而不是as.yearmon。< / p>

答案 1 :(得分:0)

部分想法来自this question

require(ggplot2)
require(scales)

df <- data.frame(timestamp = c("2013-07-15 22:12:43 EST",
"2013-05-04 21:30:06 EST",
"2017-01-02 02:28:02 EST",
"2013-02-28 08:06:09 EST",
"2011-11-10 13:57:16 EST",
"2015-11-12 21:05:37 EST",
"2011-10-31 13:02:21 EST",
"2015-01-18 12:22:45 EST",
"2013-02-04 11:57:41 EST",
"2011-10-16 21:54:27 EST",
"2013-06-19 23:11:45 EST",
"2015-08-16 19:26:29 EST",
"2016-11-09 21:48:20 EST",
"2011-06-13 13:30:19 EST",
"2012-05-08 02:50:42 EST",
"2014-10-15 23:27:28 EST",
"2012-03-11 00:56:05 EST",
"2014-07-16 17:32:34 EST",
"2011-08-08 19:01:39 EST",
"2014-08-31 13:41:49 EST",
"2017-03-09 23:23:45 EST",
"2013-02-16 13:27:49 EST",
"2012-08-22 23:58:33 EST",
"2012-04-20 11:06:32 EST",
"2016-01-22 20:50:30 EST"))

#Convert data to date
df$timestamp <- as.Date(df$timestamp)

#Count by year and month
new <- data.frame(table(format(df$timestamp, "%Y-%m")))

#Append a day
new$Var1 <- paste0(new$Var1, "-1")

#Turn back into date
new$Var1 <- as.Date(new$Var1, format = "%Y-%m-%d")

#Plot using scale_x_date with 1 month breaks
g <- ggplot(data = new , aes(x = Var1, y = Freq)) + 
  geom_bar(stat="identity") + 
  scale_x_date(labels = date_format("%Y-%m"), breaks = date_breaks("1 month")) + 
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
print(g)
ggsave("g.png")

Final Plot