背景 对于我的论文,我有几百个大的CSV文件。这些文件包含时间序列,在2016年11月1日至2012年3月1日期间,天气参数的半小时周期为23800。 此外,每个时期有20或21个运行时间(数字不固定,这在我的情况下基本上是问题)。运行时标记计算预测时某个预测期的时间。因此,预测主要是在预测时间之前计算出来的(这自然是有意义的)但是,无论出于何种原因,情况并非总是如此。对于某些时段(大多数但并不总是)在上午9:00到凌晨12:00之间,每个时段都有一个运行时间,将在未来计算。我不想拥有这个"未来"运行时(我无法理解为什么包含它)
数据的示例摘录:
+-----------------------+-----------------------+-------------+
| ForecastPeriod | Runtime | Value |
+-----------------------+-----------------------+-------------+
| … | … | … |
| 02.11.2016 11:30+0000 | 31.10.2016 00:00+0000 | 5.544368776 |
| 02.11.2016 11:30+0000 | 31.10.2016 12:00+0000 | 4.71684533 |
| 02.11.2016 11:30+0000 | 01.11.2016 00:00+0000 | 5.374274986 |
| 02.11.2016 11:30+0000 | 01.11.2016 12:00+0000 | 5.892114875 |
| 02.11.2016 11:30+0000 | 02.11.2016 00:00+0000 | 6.18387462 | <-i want this row
| 02.11.2016 11:30+0000 | 02.11.2016 12:00+0000 | 5.852306909 | <- don't make sense
| 02.11.2016 12:00+0000 | 23.10.2016 12:00+0000 | 14.81608444 |
| 02.11.2016 12:00+0000 | 24.10.2016 00:00+0000 | 3.637574565 |
| … | … | ... |
| 02.11.2016 12:00+0000 | 01.11.2016 12:00+0000 | 5.541325144 |
| 02.11.2016 12:00+0000 | 02.11.2016 00:00+0000 | 5.745831136 | <- i want this row
| 02.11.2016 12:00+0000 | 02.11.2016 12:00+0000 | 5.347949883 | <- don't make sense
| 02.11.2016 12:30+0000 | 24.10.2016 00:00+0000 | 3.80366064 |
| 02.11.2016 12:30+0000 | 24.10.2016 12:00+0000 | 5.533042696 |
| … | … | … |
| 02.11.2016 12:30+0000 | 01.11.2016 12:00+0000 | 5.429153394 |
| 02.11.2016 12:30+0000 | 02.11.2016 00:00+0000 | 5.580232543 |
| 02.11.2016 12:30+0000 | 02.11.2016 12:00+0000 | 5.266140403 | <- i want this row
| 02.11.2016 13:00+0000 | 24.10.2016 00:00+0000 | 3.969746715 | <- here is no "future" runtime
| 02.11.2016 13:00+0000 | 24.10.2016 12:00+0000 | 5.704328337 |
| … | … | … |
+-----------------------+-----------------------+-------------+
现在我的工作解决方案: 我现在正在做的是,遍历大数据框并过滤符合我期望的数据。它有效,但在我的笔记本电脑上却很慢。 (花了将近一个小时来完成500.000行),我有大量的csv文件要经过... 我问自己,是否有可能更快地做到这一点?如果它们工作得更快,我也可以使用额外的R包。此外,我正在考虑将数据上传到更快的SQL Server;是否在SQL上更快地处理这样的日期比较任务?
#Some preliminary transformations for the comparable posixct format:
LA_Date_EC$Forecast.Time<-as.POSIXlt(LA_Date_EC$Forecast.Time,format="%d.%m.%Y %H:%M+%S",tz="UTC")
LA_Date_EC$Runtime.Forecast<-as.POSIXlt(LA_Date_EC$Runtime.Forecast,format="%d.%m.%Y %H:%M+%S",tz="UTC")
test.df2<-data.frame()
names(test.df2)<-names(LA_Date_EC) ##info: Datetimes
for (l in 2:469501){
if(LA_Date_EC[l,1]!=LA_Date_EC[l+1,1]){
#print(l)
if(LA_Date_EC[l,2]>=LA_Date_EC[l,1]){
test.df2<-rbind.data.frame(test.df2,LA_Date_EC[l-1,])
}else{
test.df2<-rbind.data.frame(test.df2,LA_Date_EC[l,])
}
}
}
编辑:示例摘录为R:
中的输入输出structure(list(Forecast.Time = structure(list(sec = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), min = c(30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L,
30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 30L, 30L, 30L, 30L), hour = c(10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L), mday = c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), mon = c(10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L
), year = c(116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L), wday = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
yday = c(306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L, 306L,
306L, 306L, 306L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("sec", "min",
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt",
"POSIXt"), tzone = "UTC"), Runtime.Forecast = structure(list(
sec = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 12L,
0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 12L,
0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L,
12L, 0L, 12L, 0L, 12L, 0L, 12L, 12L, 0L, 12L, 0L, 12L, 0L,
12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L,
0L, 12L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L,
0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L,
12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L,
0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L,
12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L, 0L, 12L,
0L, 12L), mday = c(27L, 27L, 28L, 28L, 29L, 29L, 30L, 30L,
31L, 31L, 1L, 1L, 2L, 2L, 23L, 24L, 24L, 25L, 25L, 26L, 26L,
27L, 27L, 28L, 28L, 29L, 29L, 30L, 30L, 31L, 31L, 1L, 1L,
2L, 2L, 23L, 24L, 24L, 25L, 25L, 26L, 26L, 27L, 27L, 28L,
28L, 29L, 29L, 30L, 30L, 31L, 31L, 1L, 1L, 2L, 2L, 23L, 24L,
24L, 25L, 25L, 26L, 26L, 27L, 27L, 28L, 28L, 29L, 29L, 30L,
30L, 31L, 31L, 1L, 1L, 2L, 2L, 24L, 24L, 25L, 25L, 26L, 26L,
27L, 27L, 28L, 28L, 29L, 29L, 30L, 30L, 31L, 31L, 1L, 1L,
2L, 2L, 24L, 24L, 25L, 25L, 26L, 26L, 27L, 27L, 28L, 28L,
29L, 29L, 30L, 30L, 31L, 31L, 1L, 1L, 2L, 2L, 24L, 24L, 25L,
25L), mon = c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L,
10L, 10L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 9L, 9L, 9L, 9L), year = c(116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L,
116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L, 116L
), wday = c(4L, 4L, 5L, 5L, 6L, 6L, 0L, 0L, 1L, 1L, 2L, 2L,
3L, 3L, 0L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L,
0L, 0L, 1L, 1L, 2L, 2L, 3L, 3L, 0L, 1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 3L, 0L,
1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 0L, 0L, 1L,
1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L,
6L, 6L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L,
3L, 4L, 4L, 5L, 5L, 6L, 6L, 0L, 0L, 1L, 1L, 2L, 2L, 3L, 3L,
1L, 1L, 2L, 2L), yday = c(300L, 300L, 301L, 301L, 302L, 302L,
303L, 303L, 304L, 304L, 305L, 305L, 306L, 306L, 296L, 297L,
297L, 298L, 298L, 299L, 299L, 300L, 300L, 301L, 301L, 302L,
302L, 303L, 303L, 304L, 304L, 305L, 305L, 306L, 306L, 296L,
297L, 297L, 298L, 298L, 299L, 299L, 300L, 300L, 301L, 301L,
302L, 302L, 303L, 303L, 304L, 304L, 305L, 305L, 306L, 306L,
296L, 297L, 297L, 298L, 298L, 299L, 299L, 300L, 300L, 301L,
301L, 302L, 302L, 303L, 303L, 304L, 304L, 305L, 305L, 306L,
306L, 297L, 297L, 298L, 298L, 299L, 299L, 300L, 300L, 301L,
301L, 302L, 302L, 303L, 303L, 304L, 304L, 305L, 305L, 306L,
306L, 297L, 297L, 298L, 298L, 299L, 299L, 300L, 300L, 301L,
301L, 302L, 302L, 303L, 303L, 304L, 304L, 305L, 305L, 306L,
306L, 297L, 297L, 298L, 298L), isdst = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"), tzone = "UTC"), Wind.Speed = c(12.0889469204481,
8.1534483762018, 11.229031832199, 9.51623004872928, 7.99700924410322,
8.06420698869646, 7.46190421726437, 7.95691440205356, 8.19089703425263,
7.50023772800533, 7.46471349405832, 7.87503218264228, 8.10704533381368,
8.25997087655172, 12.9641999142878, 5.95739166070848, 8.48709144265445,
12.3686489749888, 3.27377438788927, 3.8132283355639, 5.40513611081943,
12.3699466614361, 7.91484229489558, 11.0188269744693, 9.56301437212706,
7.91921747636113, 7.86903553214633, 7.4208161449472, 7.7049451673898,
8.02618971449148, 7.32764074016071, 7.26610021373866, 7.70408467708526,
7.90262085370489, 8.1065215773556, 13.5223226992998, 6.23422083753905,
8.45254447734072, 12.3148462884236, 3.00095427388565, 4.11192170847009,
5.35120820775642, 12.6509464024241, 7.67623621358937, 10.8086221167397,
9.60979869552483, 7.84142570861904, 7.67386407559621, 7.37972807263003,
7.45297593272604, 7.86148239473033, 7.15504375231609, 7.06748693341901,
7.53313717152826, 7.69819637359609, 7.95307227815948, 14.0804454843119,
6.51105001436962, 8.41799751202699, 12.2610436018583, 2.72813415988203,
4.41061508137628, 5.2972803046934, 12.9319461434121, 7.43763013228316,
10.59841725901, 9.65658301892261, 7.76363394087695, 7.47869261904608,
7.33864000031286, 7.20100669806228, 7.69677507496918, 6.98244676447147,
6.86887365309936, 7.36218966597125, 7.4937718934873, 7.79962297896336,
6.47610742140243, 8.44809907428991, 12.1754295175459, 2.80289574128868,
4.43071887689015, 5.25901387681356, 12.8048270636345, 7.77257529660677,
10.6689406707837, 9.67371178278272, 7.71232463800448, 7.639287068313,
7.37678432847625, 7.39386920787284, 7.65056355621861, 7.0961073828294,
6.94340806177623, 7.41655132109855, 7.53010844435008, 7.89628470472931,
6.44116482843524, 8.47820063655284, 12.0898154332335, 2.87765732269532,
4.450822672404, 5.22074744893371, 12.6777079838569, 8.10752046093037,
10.7394640825575, 9.69084054664283, 7.66101533513201, 7.79988151757991,
7.41492865663964, 7.58673171768341, 7.60435203746804, 7.20976800118733,
7.01794247045311, 7.47091297622585, 7.56644499521287, 7.99294643049527,
6.40622223546805, 8.50830219881576, 12.0042013489211, 2.95241890410197
)), .Names = c("Forecast.Time", "Runtime.Forecast", "Wind.Speed"
), row.names = 1400:1520, class = "data.frame")
答案 0 :(得分:2)
您可以使用dplyr
或data.table
执行此操作。 data.table
应该是您的最快解决方案。
<强> dplyr 强>
library(dplyr)
df$Forecast.Time <- as.POSIXct(df$Forecast.Time)
df$Runtime.Forecast <- as.POSIXct(df$Runtime.Forecast)
filtered <- df %>% filter(Forecast.Time > Runtime.Forecast) %>%
group_by(Forecast.Time) %>%
summarise_all(funs(last))
<强> data.table 强>
library(data.table)
df_dt <- as.data.table(df)
filtered_dt <- dat_dt[Forecast.Time > Runtime.Forecast, lapply(.SD, last), by = Forecast.Time]
答案 1 :(得分:1)
这是使用dplyr软件包的潜在解决方案。使用超前/滞后功能和group_by消除了循环。
正如我在上面的评论中提到的,我将日期/时间转换为POSIXct对象。
library(dplyr)
#df is a copy of the orginal data
df<-LA_Date_EC
#find all future values and remove them from the data
future<-LA_Date_EC[,2]>=lag(LA_Date_EC[,1])
future[1]<-FALSE
df<-df[!future,]
#Group by the Forecast time and then find the last row
answer<-df %>% group_by(Forecast.Time) %>%
summarize(Runtime.Forecas= last(Runtime.Forecast), Wind.Speed = last(Wind.Speed))