我已经生成了一系列小时时间戳:
intervals <- seq(as.POSIXct("2018-01-20 00:00:00", tz = 'America/Los_Angeles'), as.POSIXct("2018-01-20 03:00:00", tz = 'America/Los_Angeles'), by="hour")
> intervals
[1] "2018-01-20 00:00:00 PST" "2018-01-20 01:00:00 PST" "2018-01-20 02:00:00 PST"
[4] "2018-01-20 03:00:00 PST"
如果数据集中包含杂乱且间隔不均匀的时间戳,那么如何将该数据集中的时间值与最近的每小时时间戳匹配id
,并删除其间的其他时间戳?例如:
> test
time id amount
312 2018-01-20 00:02:14 PST 1 54.9508346
8652 2018-01-20 00:54:41 PST 2 30.5557992
13809 2018-01-20 01:19:27 PST 3 90.5459248
586 2018-01-20 00:03:35 PST 1 79.7635973
9077 2018-01-20 00:56:37 PST 2 75.5356406
21546 2018-01-20 02:25:05 PST 3 36.6017705
7275 2018-01-20 00:47:45 PST 1 12.7618139
12768 2018-01-20 01:15:30 PST 2 72.4465838
1172 2018-01-20 00:08:01 PST 3 81.0468155
24106 2018-01-20 03:04:10 PST 1 0.8615881
14464 2018-01-20 01:25:04 PST 2 49.8718743
15344 2018-01-20 01:29:30 PST 3 85.0054113
14255 2018-01-20 01:23:22 PST 1 34.5093891
21565 2018-01-20 02:25:40 PST 2 69.0175725
15602 2018-01-20 01:31:32 PST 3 61.8602426
会产生:
> output
interval id amount
1 2018-01-20 01:00:00 1 12.7618139
2 2018-01-20 1 54.9508346
3 2018-01-20 03:00:00 1 0.8615881
4 2018-01-20 01:00:00 2 75.5356400
5 2018-01-20 02:00:00 2 69.0175700
6 2018-01-20 3 81.0468200
7 2018-01-20 01:00:00 3 90.5459200
8 2018-01-20 02:00:00 3 36.6017700
我了解data.table
setDT(reference)[data, refvalue, roll = "nearest", on = "datetime"]
使用roll = nearest
,但如何在intervals
中为id
中的每个test
找到最近的匹配并保留amount
属性?
任何建议将不胜感激!以下是示例数据:
dput(test)
structure(list(time = c("2018-01-20 00:02:14 PST", "2018-01-20 00:54:41 PST",
"2018-01-20 01:19:27 PST", "2018-01-20 00:03:35 PST", "2018-01-20 00:56:37 PST",
"2018-01-20 02:25:05 PST", "2018-01-20 00:47:45 PST", "2018-01-20 01:15:30 PST",
"2018-01-20 00:08:01 PST", "2018-01-20 03:04:10 PST", "2018-01-20 01:25:04 PST",
"2018-01-20 01:29:30 PST", "2018-01-20 01:23:22 PST", "2018-01-20 02:25:40 PST",
"2018-01-20 01:31:32 PST"), id = c(1, 2, 3, 1, 2, 3, 1, 2, 3,
1, 2, 3, 1, 2, 3), amount = c(54.9508346011862, 30.5557992309332,
90.5459248460829, 79.763597343117, 75.5356406327337, 36.6017704829574,
12.7618139144033, 72.4465838400647, 81.0468154959381, 0.861588073894382,
49.8718742514029, 85.0054113194346, 34.5093891490251, 69.0175724914297,
61.8602426256984)), .Names = c("time", "id", "amount"), row.names = c(312L,
8652L, 13809L, 586L, 9077L, 21546L, 7275L, 12768L, 1172L, 24106L,
14464L, 15344L, 14255L, 21565L, 15602L), class = "data.frame")
答案 0 :(得分:5)
另一种选择是在 public ActionResult PrintReprtForSpecicDates(DateTime startdate, DateTime enddate)
{
using (ProDbDataContext _Context = new ProDbDataContext())
{
List<Sp_GetSpecificRecordResult> RecordList = _Context.Sp_GetSpecificRecord(startdate,enddate).ToList();
var dt = Helper.Helper.ToDataTable(RecordList);
RptGetSpecificRecords reportobj = new RptGetSpecificRecords();
reportobj.DataSource = dt;
reportobj.Parameters["Startdate"].Value = Convert.ToDateTime(startdate).ToShortDateString();
reportobj.Parameters["Enddate"].Value = Convert.ToDateTime(enddate).ToShortDateString();
var stream = new MemoryStream();
reportobj.ExportToPdf(stream);
return File(stream.GetBuffer(), "application/pdf");
}
}
内加入j
:
data.table
给出:
# convert 'test' to a 'data.table' first with 'setDT' # and convert the 'time'-column tot a datetime format setDT(test)[, time := as.POSIXct(time)][] # preform the join test[, .SD[.(time = intervals), on = .(time), roll = 'nearest'], by = id]
在上述方法中,某些 id time amount
1: 1 2018-01-20 00:00:00 54.9508346
2: 1 2018-01-20 01:00:00 12.7618139
3: 1 2018-01-20 02:00:00 34.5093891
4: 1 2018-01-20 03:00:00 0.8615881
5: 2 2018-01-20 00:00:00 30.5557992
6: 2 2018-01-20 01:00:00 75.5356406
7: 2 2018-01-20 02:00:00 69.0175725
8: 2 2018-01-20 03:00:00 69.0175725
9: 3 2018-01-20 00:00:00 81.0468155
10: 3 2018-01-20 01:00:00 90.5459248
11: 3 2018-01-20 02:00:00 36.6017705
12: 3 2018-01-20 03:00:00 36.6017705
- 值被amount
分配给多个time
。如果你不想那样,只想保留最接近id
的那些,你可以按如下方式改进方法:
time
给出:
test[, r := rowid(id) ][, .SD[.(time = intervals) , on = .(time) , roll = 'nearest' , .(time, amount, r, time_diff = abs(x.time - i.time)) ][, .SD[which.min(time_diff)], by = r] , by = id][, c('r','time_diff') := NULL][]
答案 1 :(得分:1)
灵感来自@DavidAurenburg解决方案,精简版:
test[,
.(amount=amount[which.min(abs(time - round(time, "hour")))]),
keyby=.(id, as.character(round(time, "hour")))]
也许你想在你的连接中包含id。使用最近的时,您可能会从几小时前的数据中获得匹配
output <- test[intervals, on=c("id","time"), roll="nearest"]
setorder(output, id, time)
output
# time id amount
# 1: 2018-01-20 00:00:00 1 54.9508346
# 2: 2018-01-20 01:00:00 1 12.7618139
# 3: 2018-01-20 02:00:00 1 34.5093891
# 4: 2018-01-20 03:00:00 1 0.8615881
# 5: 2018-01-20 00:00:00 2 30.5557992
# 6: 2018-01-20 01:00:00 2 75.5356406
# 7: 2018-01-20 02:00:00 2 69.0175725
# 8: 2018-01-20 03:00:00 2 69.0175725
# 9: 2018-01-20 00:00:00 3 81.0468155
# 10: 2018-01-20 01:00:00 3 90.5459248
# 11: 2018-01-20 02:00:00 3 36.6017705
# 12: 2018-01-20 03:00:00 3 36.6017705
希望看到更优雅地使用data.table来解决这个问题。
数据:
intervals <- CJ(time=seq(as.POSIXct("2018-01-20 00:00:00"),
as.POSIXct("2018-01-20 03:00:00"),
by="hour"), id=1:3)
test <- fread("time,id,amount
2018-01-20 00:02:14 PST,1,54.9508346
2018-01-20 00:54:41 PST,2,30.5557992
2018-01-20 01:19:27 PST,3,90.5459248
2018-01-20 00:03:35 PST,1,79.7635973
2018-01-20 00:56:37 PST,2,75.5356406
2018-01-20 02:25:05 PST,3,36.6017705
2018-01-20 00:47:45 PST,1,12.7618139
2018-01-20 01:15:30 PST,2,72.4465838
2018-01-20 00:08:01 PST,3,81.0468155
2018-01-20 03:04:10 PST,1,0.8615881
2018-01-20 01:25:04 PST,2,49.8718743
2018-01-20 01:29:30 PST,3,85.0054113
2018-01-20 01:23:22 PST,1,34.5093891
2018-01-20 02:25:40 PST,2,69.0175725
2018-01-20 01:31:32 PST,3,61.8602426")[,
time:=as.POSIXct(time)]
答案 2 :(得分:1)
使用lubridate这样的东西?
library(lubridate);library(dplyr)
test$time<-ymd_hms(test$time)
test$HTime=round_date(test$time,unit="hour")
test$DiffTime=abs(test$time-test$HTime)
result=test%>%group_by(id,HTime)%>%summarize(amount=amount[DiffTime==min(DiffTime)])
result
# A tibble: 8 x 3
# Groups: id [?]
id HTime amount
<dbl> <dttm> <dbl>
1 1.00 2018-01-20 00:00:00 55.0
2 1.00 2018-01-20 01:00:00 12.8
3 1.00 2018-01-20 03:00:00 0.862
4 2.00 2018-01-20 01:00:00 75.5
5 2.00 2018-01-20 02:00:00 69.0
6 3.00 2018-01-20 00:00:00 81.0
7 3.00 2018-01-20 01:00:00 90.5
8 3.00 2018-01-20 02:00:00 36.6