我正在尝试从R中的时间序列数据创建直方图,类似于this question。每个bin应该显示bin中值的总持续时间。我在数千行的动物园对象中有非整数采样时间。时间戳是不规则的,并且假设数据在每个时间戳(采样和保持)之间保持不变。
示例数据:
library(zoo)
library(ggplot2)
timestamp = as.POSIXct(c("2018-02-21 15:00:00.0", "2018-02-21 15:00:02.5", "2018-02-21 15:00:05.2", "2018-02-21 15:00:07.0", "2018-02-21 15:00:09.3", "2018-02-21 15:00:10.0", "2018-02-21 15:00:12.0"), tz = "GMT")
data = c(0,3,5,1,3,0,2)
z = zoo(data, order.by = timestamp)
x.df <- data.frame(Date = index(z), Value = as.numeric(coredata(z)))
ggplot(x.df, aes(x = Date, y = Value)) + geom_step() + scale_x_datetime(labels = date_format("%H:%M:%OS"))
请参阅times-series plot here。使用hist(z, freq = T)
创建直方图并不关心时间戳:Plot from hist method。
我想要的输出是一个直方图,y轴上的持续时间以秒为单位,如下所示:Histogram with non-integer duration on y-axis。
编辑:
我应该指出数据值不是整数,我希望能够控制bin宽度。我可以使用diff(timestamp)
创建一个(非整数)列,显示每个点的持续时间,并绘制一个像@MKR建议的条形图:
x.df = data.frame(DurationSecs = as.numeric(diff(timestamp)), Value = data[-length(data)])
ggplot(x.df, aes(x = Value, y = DurationSecs)) + geom_bar(stat = "identity")
这为示例提供了具有正确条形高度的直方图。但是当值是浮点数时,这会失败。
答案 0 :(得分:0)
由于您希望duration (in seconds)
上有y-axis
,因此您应在x.df
中为duration
添加一列。具有stat = sum
的直方图将满足OP的需求。步骤是
library(zoo)
library(dplyr)
timestamp = as.POSIXct(c("2018-02-21 15:00:00.0", "2018-02-21 15:00:02.5",
"2018-02-21 15:00:05.2", "2018-02-21 15:00:07.0", "2018-02-21 15:00:09.3",
"2018-02-21 15:00:10.0", "2018-02-21 15:00:12.0"), tz = "GMT")
data = c(0,3,5,1,3,0,2)
z = zoo(data, order.by = timestamp)
x.df <- data.frame(Date = index(z), Value = as.numeric(coredata(z)))
# DurationSecs is added as numeric. It shows diff from earliest time.
x.df <- x.df %>% arrange(Date) %>%
mutate(DurationSecs = ifelse(is.na(lead(Date)), 0, lead(Date) - Date))
# Draw the plot now
ggplot(x.df, aes(x = Value, y = DurationSecs)) + geom_histogram(stat="sum")
#The data
# Date Value DurationSecs
#1 2018-02-21 15:00:00 0 2.5
#2 2018-02-21 15:00:02 3 2.7
#3 2018-02-21 15:00:05 5 1.8
#4 2018-02-21 15:00:07 1 2.3
#5 2018-02-21 15:00:09 3 0.7
#6 2018-02-21 15:00:10 0 2.0
#7 2018-02-21 15:00:12 2 0.0