我想要一些我的数据的计数直方图。数据的时间间隔不均匀(即可能会有一些日子丢失)。我可以使用
创建直方图ym_plot <- ggplot(data = df %>% mutate(timestamp = as.POSIXct(timestamp)), aes(timestamp)) +
geom_histogram(aes(fill = ..count..))
print(ym_plot)
但是,每年之间有8个箱子,因此箱子不会映射到几个月。有没有一种简单的方法将垃圾箱设置为一个月?如果数据在一年开始时开始,我会12*number_of_months
。
编辑:
这是一个示例
[1] "2013-07-15 22:12:43 EST"
[1] "2013-05-04 21:30:06 EST"
[1] "2017-01-02 02:28:02 EST"
[1] "2013-02-28 08:06:09 EST"
[1] "2011-11-10 13:57:16 EST"
[1] "2015-11-12 21:05:37 EST"
[1] "2011-10-31 13:02:21 EST"
[1] "2015-01-18 12:22:45 EST"
[1] "2013-02-04 11:57:41 EST"
[1] "2011-10-16 21:54:27 EST"
[1] "2013-06-19 23:11:45 EST"
[1] "2015-08-16 19:26:29 EST"
[1] "2016-11-09 21:48:20 EST"
[1] "2011-06-13 13:30:19 EST"
[1] "2012-05-08 02:50:42 EST"
[1] "2014-10-15 23:27:28 EST"
[1] "2012-03-11 00:56:05 EST"
[1] "2014-07-16 17:32:34 EST"
[1] "2011-08-08 19:01:39 EST"
[1] "2014-08-31 13:41:49 EST"
[1] "2017-03-09 23:23:45 EST"
[1] "2013-02-16 13:27:49 EST"
[1] "2012-08-22 23:58:33 EST"
[1] "2012-04-20 11:06:32 EST"
[1] "2016-01-22 20:50:30 EST"
答案 0 :(得分:1)
我不清楚您是否要将数据分组为12个分箱,无论您的系列跨越多少年,或者您希望将系列汇总到月度频率,每个日历月分组一个。我将假设后者。所以:
# make some toy data representing an irregular time series, i.e., you have observations
# for some days but not others
set.seed(1)
dates <- sample(seq(from = as.Date("2015-01-01"), to = as.Date("2016-12-31"), by = "day"), 300)
values <- rnorm(300, 10, 2)
df <- data.frame(date = dates, value = values)
# load the packages we'll use. we need 'zoo' for its yearmon function.
library(dplyr)
library(ggplot2)
library(zoo)
# now...
df %>%
# use 'as.yearmon' to create a variable identifying the unique year-month
# combination in which each observation falls
mutate(yearmon = as.yearmon(date)) %>%
# use that variable to group the data
group_by(yearmon) %>%
# count the number of observations in each of those year-month bins. if you
# want to summarise the data some other way, use 'summarise' here instead.
tally() %>%
# plot the resulting series with yearmon on the x-axis and using 'geom_col'
# instead of 'geom_hist' to preserve the temporal ordering and avoid
# having to specify stat = "identity"
ggplot(aes(x = yearmon, y = n)) + geom_col()
结果:
如果您只需要12个分箱,无论您的数据跨越多少年,您都可以使用month
包中的lubridate
功能创建分组变量,而不是as.yearmon
。< / p>
答案 1 :(得分:0)
部分想法来自this question。
require(ggplot2)
require(scales)
df <- data.frame(timestamp = c("2013-07-15 22:12:43 EST",
"2013-05-04 21:30:06 EST",
"2017-01-02 02:28:02 EST",
"2013-02-28 08:06:09 EST",
"2011-11-10 13:57:16 EST",
"2015-11-12 21:05:37 EST",
"2011-10-31 13:02:21 EST",
"2015-01-18 12:22:45 EST",
"2013-02-04 11:57:41 EST",
"2011-10-16 21:54:27 EST",
"2013-06-19 23:11:45 EST",
"2015-08-16 19:26:29 EST",
"2016-11-09 21:48:20 EST",
"2011-06-13 13:30:19 EST",
"2012-05-08 02:50:42 EST",
"2014-10-15 23:27:28 EST",
"2012-03-11 00:56:05 EST",
"2014-07-16 17:32:34 EST",
"2011-08-08 19:01:39 EST",
"2014-08-31 13:41:49 EST",
"2017-03-09 23:23:45 EST",
"2013-02-16 13:27:49 EST",
"2012-08-22 23:58:33 EST",
"2012-04-20 11:06:32 EST",
"2016-01-22 20:50:30 EST"))
#Convert data to date
df$timestamp <- as.Date(df$timestamp)
#Count by year and month
new <- data.frame(table(format(df$timestamp, "%Y-%m")))
#Append a day
new$Var1 <- paste0(new$Var1, "-1")
#Turn back into date
new$Var1 <- as.Date(new$Var1, format = "%Y-%m-%d")
#Plot using scale_x_date with 1 month breaks
g <- ggplot(data = new , aes(x = Var1, y = Freq)) +
geom_bar(stat="identity") +
scale_x_date(labels = date_format("%Y-%m"), breaks = date_breaks("1 month")) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
print(g)
ggsave("g.png")