data.table中行绑定时间序列的有效方式,具有正确排序的时间戳

时间:2018-05-31 00:57:03

标签: r data.table

是否有更有效的方法来对数据表进行两次或更多次大规模时间序列的行绑定(或高效合并)?时间序列有一些不同的列,因此我使用fill = TRUE

我希望每个时间序列中的所有行都出现在最终的data.table中。我可以在下面执行此操作,但时间序列标记未在下面dt3中排序。我必须创建dt4才能获得有序的邮票。

我想知道是否有更有效的方法在data.table中进行一种rbind / time series合并?

library(data.table)

tm <- seq(as.POSIXct("2018-05-12 00:00"), as.POSIXct("2018-05-14"), by = "hours")
dt <- data.table(time = tm, x = seq(1, length(tm), by = 1))

set.seed(1)

dt2 <- data.table(time = tm[sample(length(tm), size = 8)] + rnorm(n = 8, 0, 60),
                 y = rnorm(8))

# Can a one liner here get me the output in `dt4` with some kind of row bind? 
#  Is there a way to do a row bind here instead that avoids the creation of a new object dt4 that takes the sorted rows?

dt3 <- rbind(dt, dt2, fill = TRUE)

dt4 <- dt3[order(time)]

                      tail(dt4, 20)
#                   time  x           y
# 1: 2018-05-13 08:00:00 33          NA
# 2: 2018-05-13 09:00:00 34          NA
# 3: 2018-05-13 10:00:00 35          NA
# 4: 2018-05-13 11:00:00 36          NA
# 5: 2018-05-13 12:00:00 37          NA
# 6: 2018-05-13 13:00:00 38          NA
# 7: 2018-05-13 14:00:00 39          NA
# 8: 2018-05-13 14:59:41 NA  0.94383621
# 9: 2018-05-13 15:00:00 40          NA
# 10: 2018-05-13 16:00:00 41          NA
# 11: 2018-05-13 16:01:30 NA  0.82122120
# 12: 2018-05-13 17:00:00 42          NA
# 13: 2018-05-13 17:00:44 NA -0.04493361
# 14: 2018-05-13 18:00:00 43          NA
# 15: 2018-05-13 19:00:00 44          NA
# 16: 2018-05-13 20:00:00 45          NA
# 17: 2018-05-13 21:00:00 46          NA
# 18: 2018-05-13 22:00:00 47          NA
# 19: 2018-05-13 23:00:00 48          NA
# 20: 2018-05-14 00:00:00 49          NA

1 个答案:

答案 0 :(得分:3)

如果您将时间列设置为键

CREATE TABLE Candidates (ID INT(11) AUTO_INCREMENT NOT NULL Primary Key,
    ApplicationID varchar(6),
    FirstName varchar(100) NOT NULL,
    MiddleName varchar(100) NOT NULL,
    LastName varchar(100) NOT NULL,
    DateOfBirth date NOT NULL,
    Gender varchar(1) NOT NULL);

然后您可以使用setkey(dt, time) setkey(dt2, time)

merge.data.table

注意,如果已知时间序列是已排序的(dt是,但dt2不是),则只需设置&#39; sorted&#39;即可加快速度。 data.tables的属性,而不是调用merge(dt,dt2,all=TRUE)

setkey