我有两个.csv文件,其中包含底部给出的两个单独的时间序列。我可以将这些作为数据帧导入R:
data1 <- read.csv(data1.csv)
data2 <- read.csv(data2.csv)
我在每个数据框中都有date
,time
和price
个信息。 我想将data1
和data2
的价格与单个表格中的常见频率10秒对齐。
我有两个时间序列的开始和结束日期和时间,但是频率(因此每个节目的观察次数,比如一天)是不同的,每天的开始和结束时间也不同
我厌倦了使用ts()
,但我不认为此功能可以同时使用日期和时间。
将这些时间序列与共同频率对齐的最有效方法是什么?
data1.csv:
date,time,price
01/06/2014,05:59:42,1954.75
01/06/2014,06:00:05,1954.875
01/06/2014,06:00:06,1954.75
01/06/2014,06:00:08,1954.875
01/06/2014,06:02:05,1954.625
01/06/2014,06:02:22,1954.875
01/06/2014,06:03:12,1954.75
01/06/2014,06:03:14,1954.625
01/06/2014,06:03:20,1954.75
01/06/2014,06:03:22,1954.875
01/06/2014,06:03:23,1954.75
01/06/2014,06:03:26,1954.875
01/06/2014,06:07:07,1955.125
01/06/2014,06:07:21,1954.875
01/06/2014,06:08:54,1954.625
01/06/2014,06:16:55,1954.375
01/06/2014,06:17:00,1954.625
01/06/2014,06:21:46,1954.875
01/06/2014,06:28:11,1955.125
01/06/2014,06:30:23,1955.375
01/06/2014,06:30:49,1955.125
01/06/2014,06:33:33,1955.375
01/06/2014,06:34:30,1955.125
01/06/2014,06:37:39,1955.375
01/06/2014,06:37:43,1955.125
01/06/2014,06:47:42,1954.875
01/06/2014,06:50:23,1955.125
01/06/2014,06:57:10,1954.875
01/06/2014,06:57:12,1955.125
01/06/2014,07:00:08,1954.875
01/06/2014,07:00:21,1955.125
01/06/2014,07:00:55,1955.375
01/06/2014,07:01:19,1955.125
01/06/2014,07:01:51,1955.375
02/06/2014,05:59:50,1966.625
02/06/2014,06:00:00,1966.375
02/06/2014,06:00:07,1966.5
02/06/2014,06:00:08,1966.625
02/06/2014,06:00:10,1966.375
02/06/2014,06:00:33,1966.125
02/06/2014,06:00:34,1966.375
02/06/2014,06:00:41,1966.125
02/06/2014,06:00:48,1966.375
02/06/2014,06:02:48,1966.625
02/06/2014,06:03:24,1966.875
02/06/2014,06:04:23,1967.125
02/06/2014,06:04:39,1966.875
02/06/2014,06:05:28,1966.625
02/06/2014,06:06:25,1966.375
02/06/2014,06:07:44,1966.625
data2.csv:
date,time,price
01/06/2014,02:05:25,0
01/06/2014,06:00:07,3231.5
01/06/2014,06:00:17,3232.5
01/06/2014,06:00:19,3231.5
01/06/2014,06:00:33,3232.5
01/06/2014,06:00:40,3231.5
01/06/2014,06:00:41,3232.5
01/06/2014,06:00:42,3231.5
01/06/2014,06:00:44,3232.5
01/06/2014,06:04:06,3233.5
01/06/2014,06:04:22,3232.5
01/06/2014,06:04:42,3233.5
01/06/2014,06:08:48,3232.5
01/06/2014,06:10:12,3231.5
01/06/2014,06:10:35,3232.5
01/06/2014,06:21:45,3233.5
01/06/2014,06:21:55,3234.5
01/06/2014,06:29:00,3235.5
01/06/2014,06:33:34,3236.5
01/06/2014,06:34:30,3235.5
01/06/2014,06:41:33,3234.5
01/06/2014,06:47:42,3233.5
01/06/2014,06:48:33,3234.5
01/06/2014,06:50:23,3235.5
01/06/2014,06:52:04,3236.5
01/06/2014,06:57:11,3235.5
01/06/2014,07:00:00,3236.5
01/06/2014,07:00:06,3235.5
01/06/2014,07:00:08,3233.5
01/06/2014,07:00:09,3234.5
01/06/2014,07:00:10,3233.5
01/06/2014,07:00:11,3234.5
01/06/2014,07:00:21,3235.5
02/06/2014,06:00:10,3252.5
02/06/2014,06:00:20,3252
02/06/2014,06:00:21,3251.5
02/06/2014,06:00:33,3250.5
02/06/2014,06:00:34,3251
02/06/2014,06:00:35,3250.5
02/06/2014,06:00:41,3249.5
02/06/2014,06:01:31,3250.5
02/06/2014,06:01:32,3249.5
02/06/2014,06:01:38,3250.5
02/06/2014,06:02:47,3251.5
02/06/2014,06:05:32,3250.5
02/06/2014,06:06:25,3249.5
02/06/2014,06:07:44,3250.5
02/06/2014,06:08:11,3249.5
02/06/2014,06:12:32,3250.5
02/06/2014,06:16:56,3251.5
02/06/2014,06:17:08,3250.5
02/06/2014,06:18:32,3251.5
02/06/2014,06:31:59,3250.5
02/06/2014,06:32:11,3251.5
02/06/2014,06:44:47,3250.5
02/06/2014,06:45:09,3251.5
02/06/2014,06:52:33,3252.5
02/06/2014,06:52:36,3253.5
02/06/2014,06:55:30,3254.5
02/06/2014,06:55:39,3253.5
02/06/2014,06:57:27,3254.5
02/06/2014,07:00:01,3253.5
02/06/2014,07:00:02,3254.5
02/06/2014,07:00:17,3253.5
02/06/2014,07:00:23,3252.5
这是数据帧&#39; data1&#39;看起来像:
date time Price
1 2014-06-01 06:03:59.614000 62.1250
2 2014-06-01 06:03:59.692000 62.2500
3 2014-06-01 06:15:42.004000 62.2375
4 2014-06-01 06:15:42.083000 61.9250
5 2014-06-01 06:17:01.654000 61.9125
6 2014-06-01 06:17:01.732000 61.9000
7 2014-06-01 06:23:41.908000 61.8200
8 2014-06-01 06:23:41.986000 61.8570
9 2014-06-01 06:23:55.211000 61.9065
10 2014-06-01 06:23:55.291000 61.8725
11 2014-06-01 06:24:11.679000 61.8715
答案 0 :(得分:3)
示例数据集
date_time <- seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"), as.POSIXlt("2014-01-07 07:00:00"), by = "1 secs")
date_time_1 <- sample(date_time, 100)
date_time_2 <- sample(date_time, 100)
data1 <- data.frame(date=as.Date(date_time_2),
time = format(date_time_1, "%H:%M:%S"),
price = rnorm(100)
)
# format the date and time
data1$datetime <- strptime(paste(data1$date, data1$time), "%Y-%m-%d %H:%M:%S")
data2 <- data.frame(date=as.Date(date_time_2),
time = format(date_time_1, "%H:%M:%S"),
price = rnorm(100)
)
# format the date and time
data2$datetime <- strptime(paste(data2$date, data2$time), "%Y-%m-%d %H:%M:%S")
下一节回答您的问题
## Round off the times to 10 second increments
data1$datetime <- data1$datetime - as.numeric(format(data1$datetime, "%S"))%%10
data2$datetime <- data2$datetime - as.numeric(format(data2$datetime, "%S"))%%10
## Aggregate the data in case there are multiple observations in one 10 second block
data1_freq <- aggregate(data1$price, list(date=as.POSIXct(data1$datetime)), mean)
data2_freq <- aggregate(data2$price, list(date=as.POSIXct(data2$datetime)), mean)
### Now merge the two data sets - not dropping any observations
data <- merge(data2_freq, data1_freq, by="date", all = TRUE)
您可以选择将其合并为完整的时间序列
## create a continuous date based on the desired freq (here 10 seconds)
cont_date_time <- data.frame(date =
seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"),
as.POSIXlt("2014-01-07 07:00:00"),
by = "10 secs")
)
# And merge the data with the complete time series
data_cont <- merge(data, cont_date_time, by = "date", all=TRUE)
将连续日期序列限制为工作日和工作时间
## create a continuous date based on the desired freq (here 10 seconds)
cont_date_time <- data.frame(date =
seq.POSIXt(as.POSIXlt("2014-01-06 06:00:00"),
as.POSIXlt("2014-01-07 07:00:00"),
by = "10 secs")
)
# Use the lubridate package to subset the date sequence
library(lubridate)
## Use the wday function to see what day of the week it is (i.e. Monday - Friday)
cont_date_time <- cont_date_time[with(cont_date_time, wday(date)>=2&wday(date)<=6) ,]
## Use the hour function to see if it is within working hours
cont_date_time <- cont_date_time[with(cont_date_time, hour(date)>=9&hour(date)<=4) ,]
# And merge the data with the complete time series
data_cont <- merge(data, cont_date_time, by = "date", all=TRUE)
答案 1 :(得分:2)
如果您使用时间序列表示,这是最简单的。在这里,我们将数据读入动物园对象。 index = 1:2
告诉它前两列包含索引,FUN=f
指定一个转换函数,它将数据转换为"chron"
类并截断为10分钟,agg=mean
指定哪个函数用于聚合数据。然后我们可以合并动物园对象:
library(zoo)
library(chron)
f <- function(d, t) trunc(chron(as.character(d), as.character(t)), "00:10:00")
z1 <- read.zoo("data.csv", header=TRUE, sep=",", index=1:2, FUN=f, agg=mean)
z2 <- read.zoo("data2.csv", header=TRUE, sep=",", index=1:2, FUN=f, agg=mean)
merge(z1, z2)
给出:
z1 z2
(01/06/14 02:00:00) NA 0.000
(01/06/14 05:50:00) 1954.750 NA
(01/06/14 06:00:00) 1954.804 3232.333
(01/06/14 06:10:00) 1954.500 3232.000
(01/06/14 06:20:00) 1955.000 3234.500
(01/06/14 06:30:00) 1955.250 3236.000
(01/06/14 06:40:00) 1954.875 3234.167
(01/06/14 06:50:00) 1955.042 3235.833
(01/06/14 07:00:00) 1955.175 3234.786
(02/06/14 05:50:00) 1966.625 NA
(02/06/14 06:00:00) 1966.533 3250.633
(02/06/14 06:10:00) NA 3251.000
(02/06/14 06:30:00) NA 3251.000
(02/06/14 06:40:00) NA 3251.000
(02/06/14 06:50:00) NA 3253.700
(02/06/14 07:00:00) NA 3253.500