有2张桌子
dums:
start end 10min
2013-04-01 00:00:54 UTC 2013-04-01 01:00:10 UTC 0.05
2013-04-01 00:40:26 UTC 2013-04-01 01:00:00 UTC 0.1
2013-04-01 02:13:20 UTC 2013-04-01 04:53:42 UTC 0.15
2013-04-02 02:22:00 UTC 2013-04-01 04:33:12 UTC 0.2
2013-04-01 02:26:23 UTC 2013-04-01 04:05:12 UTC 0.25
2013-04-01 02:42:47 UTC 2013-04-01 04:34:33 UTC 0.3
2013-04-01 02:53:12 UTC 2013-04-03 05:27:05 UTC 0.35
2013-04-02 02:54:08 UTC 2013-04-02 05:31:15 UTC 0.4
2013-04-03 02:57:16 UTC 2013-04-03 05:29:32 UTC 0.45
地图:开始和结束是跨越2013-4-1 00:00:00至2013-04-04的10分钟间隔块
我想将dt1的第3列添加到地图中,只要开始和结束时间在10分钟的块内并继续附加列
理想情况下输出应为
start end 10min
4/1/2013 0:00:00 4/1/2013 0:10:00 0.05 0
4/1/2013 0:10 4/1/2013 0:20 0.05 0
4/1/2013 0:20 4/1/2013 0:30 0.05 0
4/1/2013 0:30 4/1/2013 0:40 0.05 0
4/1/2013 0:40 4/1/2013 0:50 0.05 0.01
4/1/2013 0:50 4/1/2013 1:00 0.05 0.01
我试过
setkey(dums,start,end)
setkey(map,start,end)
foverlaps(map,dums,type="within",nomatch=0L)
我一直收到错误:
Error in foverlaps(map, dums, type = "within", nomatch = 0L) : All entries in column start should be <= corresponding entries in column end in data.table 'y'
任何指针或替代方法?
由于
答案 0 :(得分:1)
错误消息
列start中的所有条目应为&lt; = data.table'y'中列末尾的相应条目
可能是由数据集中的拼写错误引起的。
dums[start > end, with = TRUE]
返回4
,dums的第4行是:
start end min10 1: 2013-04-02 02:22:00 2013-04-01 04:33:12 0.2
将start
更改为2013-04-01 02:22:00
后,OP的代码运行正常。
但是,要实现预期输出,foverlaps()
的结果需要从长格式转换为宽格式。
这可以通过两种方式完成:
dcast(foverlaps(map, dums, nomatch = 0L), i.start + i.end ~ min10,
value.var = "min10")
i.start i.end 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 1: 2013-04-01 00:00:00 2013-04-01 00:10:00 0.05 NA NA NA NA NA NA NA NA 2: 2013-04-01 00:10:00 2013-04-01 00:20:00 0.05 NA NA NA NA NA NA NA NA 3: 2013-04-01 00:20:00 2013-04-01 00:30:00 0.05 NA NA NA NA NA NA NA NA 4: 2013-04-01 00:30:00 2013-04-01 00:40:00 0.05 NA NA NA NA NA NA NA NA 5: 2013-04-01 00:40:00 2013-04-01 00:50:00 0.05 0.1 NA NA NA NA NA NA NA --- 311: 2013-04-03 04:40:00 2013-04-03 04:50:00 NA NA NA NA NA NA 0.35 NA 0.45 312: 2013-04-03 04:50:00 2013-04-03 05:00:00 NA NA NA NA NA NA 0.35 NA 0.45 313: 2013-04-03 05:00:00 2013-04-03 05:10:00 NA NA NA NA NA NA 0.35 NA 0.45 314: 2013-04-03 05:10:00 2013-04-03 05:20:00 NA NA NA NA NA NA 0.35 NA 0.45 315: 2013-04-03 05:20:00 2013-04-03 05:30:00 NA NA NA NA NA NA 0.35 NA 0.45
或者更符合OP的预期结果:
dcast(foverlaps(map, dums, nomatch = 0L), i.start + i.end ~ rowid(i.start),
value.var = "min10")
i.start i.end 1 2 3 4 5 1: 2013-04-01 00:00:00 2013-04-01 00:10:00 0.05 NA NA NA NA 2: 2013-04-01 00:10:00 2013-04-01 00:20:00 0.05 NA NA NA NA 3: 2013-04-01 00:20:00 2013-04-01 00:30:00 0.05 NA NA NA NA 4: 2013-04-01 00:30:00 2013-04-01 00:40:00 0.05 NA NA NA NA 5: 2013-04-01 00:40:00 2013-04-01 00:50:00 0.05 0.10 NA NA NA --- 311: 2013-04-03 04:40:00 2013-04-03 04:50:00 0.35 0.45 NA NA NA 312: 2013-04-03 04:50:00 2013-04-03 05:00:00 0.35 0.45 NA NA NA 313: 2013-04-03 05:00:00 2013-04-03 05:10:00 0.35 0.45 NA NA NA 314: 2013-04-03 05:10:00 2013-04-03 05:20:00 0.35 0.45 NA NA NA 315: 2013-04-03 05:20:00 2013-04-03 05:30:00 0.35 0.45 NA NA NA
请注意,为简洁起见,已跳过参数type = "within"
。
# corrected
dums <- fread(
" 2013-04-01 00:00:54 UTC 2013-04-01 01:00:10 UTC 0.05
2013-04-01 00:40:26 UTC 2013-04-01 01:00:00 UTC 0.1
2013-04-01 02:13:20 UTC 2013-04-01 04:53:42 UTC 0.15
2013-04-01 02:22:00 UTC 2013-04-01 04:33:12 UTC 0.2
2013-04-01 02:26:23 UTC 2013-04-01 04:05:12 UTC 0.25
2013-04-01 02:42:47 UTC 2013-04-01 04:34:33 UTC 0.3
2013-04-01 02:53:12 UTC 2013-04-03 05:27:05 UTC 0.35
2013-04-02 02:54:08 UTC 2013-04-02 05:31:15 UTC 0.4
2013-04-03 02:57:16 UTC 2013-04-03 05:29:32 UTC 0.45"
)
dums <- dums[, .(start = as.POSIXct(paste(V1, V2, V3)),
end = as.POSIXct(paste(V4, V5, V6)),
min10 = V7)]
setkey(dums, start, end)
ts <- seq(as.POSIXct("2013-04-01 00:00:00 UTC"),
as.POSIXct("2013-04-04 00:00:00 UTC"),
by = "10 min")
map <- data.table(start = head(ts, -1L), end = tail(ts, -1L),
key = c("start", "end"))
答案 1 :(得分:0)
这是一个很好的捕捉POSIXct时间关闭1行。我觉得在输入数据中掩盖了这样的错误是非常愚蠢的。
最终目标是拥有3个列变量:YYYY-DD-MM;开始时间(POSIXCt),结束时间(POSIXCt)。 开始和结束时间是10分钟的窗口。 天数是365.因此有效地查看365 * 144(一天10分钟切片)。问题是,我有45万行&#34; dums&#34;数据和min10不是均匀间隔的离散间隔,它是连续数据。如果我必须聚合(sum,means,sd等),有没有办法在+ grouping中使用dcast + aggregate + foverlaps?我可以使用for循环,只是将min10值从开始到结束,但它看起来超级耗时且效率低。
输出为
5: 2013-04-01 00:40:00 2013-04-01 00:50:00 0.15
---
311: 2013-04-03 04:40:00 2013-04-03 04:50:00 0.80
map <- data.table(start = head(ts, -1L), end = tail(ts, -1L),
key = c("start", "end"))
# plus do something on the lines
dums[, .(count=.N, sum=sum(min10)), by = ID1]