识别一个数据集中位于另一个数据集中的时间之间的记录

时间:2016-06-17 19:09:24

标签: r gpx

我在一个数据帧和另一个数据帧中有GPX数据我正在调用info_access以及我想与GPX数据“合并”的其他信息。数据框之间没有共同的变量。我想在第二个数据帧(info_access)中使用TowStartDate,InclinometerStart(时间),TowEndDate和InclinometerEnd(时间)列来标识GPX数据帧中日期/时间在TowStartDate和&之间的行。 InclinometerStart和TowEndDate& InclinometerEnd。然后,我想从第二个数据帧中分配那些GPX乘以Tow值。 GPX数据集很大,所以我遇到的问题与我最初尝试这样做的方式有关。

示例GPX数据:

example_gpx<-data.frame(Long=c(-70.92108,-70.92108,-70.92108, -70.92108, -70.92108 ),
    Lat=c(41.64437,41.64437,41.64437 ,41.64437,41.64437),
    Date=c("2016-06-04","2016-06-04","2016-06-04","2016-06-04","2016-06-04"),
    Time=c("19:15:08","19:15:09","19:15:10","19:15:11","19:15:12"))

示例info_access

example_access<-structure(list(Tow = 201604001:201604005, TowStartDate = structure(c(1465012800, 
1465012800, 1465012800, 1465012800, 1465012800), class = c("POSIXct", 
"POSIXt"), tzone = ""), TowEndDate = structure(c(1465012800, 
1465012800, 1465012800, 1465012800, 1465012800), class = c("POSIXct", 
"POSIXt"), tzone = ""), InclinometerStart = c("14:06:00", "15:05:00", 
"15:51:20", "16:52:10", "17:27:50"), InclinometerEnd = c("14:22:10", 
"15:20:20", "16:06:20", "17:07:00", "17:43:00"), date_time_start = c("2016-06-04 14:06:00", 
"2016-06-04 15:05:00", "2016-06-04 15:51:20", "2016-06-04 16:52:10", 
"2016-06-04 17:27:50"), date_time_end = c("2016-06-04 14:22:10", 
"2016-06-04 15:20:20", "2016-06-04 16:06:20", "2016-06-04 17:07:00", 
"2016-06-04 17:43:00")), .Names = c("Tow", "TowStartDate", "TowEndDate", 
"InclinometerStart", "InclinometerEnd", "date_time_start", "date_time_end"
), row.names = 181:185, class = "data.frame")

我一直在尝试使用expand.grid创建一个数据集,其中包含来自信息访问数据集的所有Tow数和来自gpx数据集的所有组合。我遇到了内存大小问题,因为我的原始数据集很大。

用于在example_access中的example_gpx中标识日期/时间的示例代码:

#use expand.grid function 
Tow<-unique( example_access$Tow)
Time<-example_gpx$Time

a<-expand.grid(Tow,Time)
names(a)<-c("Tow","Time")
head(a)

b<-merge(a,example_gpx,"Time")
head(b)
length(b[,1])

c<-merge(b,example_access,by="Tow")
head(c)
str(c)
length(b[,1])

head(c)

        Tow     Time      Long      Lat       Date TowStartDate TowEndDate
1 201604001 19:15:09 -70.92108 41.64437 2016-06-04   2016-06-04 2016-06-04
2 201604001 19:15:08 -70.92108 41.64437 2016-06-04   2016-06-04 2016-06-04
3 201604001 19:15:10 -70.92108 41.64437 2016-06-04   2016-06-04 2016-06-04
4 201604001 19:15:12 -70.92108 41.64437 2016-06-04   2016-06-04 2016-06-04
5 201604001 19:15:11 -70.92108 41.64437 2016-06-04   2016-06-04 2016-06-04
6 201604002 19:15:12 -70.92108 41.64437 2016-06-04   2016-06-04 2016-06-04
  InclinometerStart InclinometerEnd     date_time_start       date_time_end
1          14:06:00        14:22:10 2016-06-04 14:06:00 2016-06-04 14:22:10
2          14:06:00        14:22:10 2016-06-04 14:06:00 2016-06-04 14:22:10
3          14:06:00        14:22:10 2016-06-04 14:06:00 2016-06-04 14:22:10
4          14:06:00        14:22:10 2016-06-04 14:06:00 2016-06-04 14:22:10
5          14:06:00        14:22:10 2016-06-04 14:06:00 2016-06-04 14:22:10
6          15:05:00        15:20:20 2016-06-04 15:05:00 2016-06-04 15:20:20

这适用于我的小例子,但如果有人对如何使用带有88,287个观察结果的GPX文件进行此操作的想法,那么它将会有所帮助或更优雅的解决方案。

 sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.3     gmt_1.2-0       RODBC_1.3-12    lubridate_1.5.0
[5] rgdal_1.0-6     maptools_0.8-36 sp_1.1-1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6     lattice_0.20-31 assertthat_0.1  grid_3.2.1     
 [5] R6_2.1.0        DBI_0.3.1       magrittr_1.5    stringi_0.5-5  
 [9] tools_3.2.1     stringr_1.0.0   foreign_0.8-63  parallel_3.2.1 
> 

1 个答案:

答案 0 :(得分:0)

这是我使用lubridate包提出的解决方案。请注意,我更改了您的示例数据,以便间隔排列,实际上可以将Tow值分配给GPX数据。 example_access未更改。

library(lubridate)

example_gpx<-data.frame(Long=c(-70.92108,-70.92108,-70.92108, -70.92108, -70.92108 ),
                        Lat=c(41.64437,41.64437,41.64437 ,41.64437,41.64437),
                        Date=c("2016-06-04","2016-06-04","2016-06-04","2016-06-04","2016-06-04"),
                        Time=c("14:15:08","15:15:09","15:59:10","17:06:11","17:30:12"))

example_access$date_time_start<-ymd_hms(example_access$date_time_start)
example_access$date_time_end<-ymd_hms(example_access$date_time_end)

example_gpx$date_time<-paste(example_gpx$Date, example_gpx$Time)
example_gpx$date_time<-ymd_hms(example_gpx$date_time)

example_gpx$Tow<-sapply(1:nrow(example_gpx), function(x)
  if(example_gpx[x,]$date_time %between%  
     c(example_access[x,]$date_time_start,example_access[x,]$date_time_end)) example_access[x,]$Tow
      else NA)

example_gpx

Long   Lat       Date     Time           date_time       Tow
-70.92 41.64 2016-06-04 14:15:08 2016-06-04 14:15:08 201604001
-70.92 41.64 2016-06-04 15:15:09 2016-06-04 15:15:09 201604002
-70.92 41.64 2016-06-04 15:59:10 2016-06-04 15:59:10 201604003
-70.92 41.64 2016-06-04 17:06:11 2016-06-04 17:06:11 201604004
-70.92 41.64 2016-06-04 17:30:12 2016-06-04 17:30:12 201604005

如果数据丢失或混乱,此方法可能有效,也可能无效。

编辑:只有当您的两个数据集相互排列时,上述方法才有效,几乎可以肯定不是。下面的代码无论两组是否排成一列都会起作用,但是,它可能会慢一点,但我认为稳健性是值得的:

    library(lubridate)
library(data.table)

example_gpx<-data.frame(Long=c(-70.92108,-70.92108,-70.92108, -70.92108, -70.92108 ),
                        Lat=c(41.64437,41.64437,41.64437 ,41.64437,41.64437),
                        Date=c("2016-06-04","2016-06-04","2016-06-04","2016-06-04","2016-06-04"),
                        Time=c("14:15:08","15:15:09","15:59:10","17:06:11","17:30:12"))

example_access$date_time_start<-ymd_hms(example_access$date_time_start)
example_access$date_time_end<-ymd_hms(example_access$date_time_end)

example_gpx$date_time<-paste(example_gpx$Date, example_gpx$Time)
example_gpx$date_time<-ymd_hms(example_gpx$date_time)

example_gpx$Tow<-NA

for(x in 1:nrow(example_access)){
  example_gpx[which(example_gpx$date_time %between% 
                      c(example_access[x,]$date_time_start,
                        example_access[x,]$date_time_end)),]$Tow<-example_access[x,]$Tow
}
example_gpx

Long   Lat       Date     Time           date_time       Tow
-70.92 41.64 2016-06-04 14:15:08 2016-06-04 14:15:08 201604001
-70.92 41.64 2016-06-04 15:15:09 2016-06-04 15:15:09 201604002
-70.92 41.64 2016-06-04 15:59:10 2016-06-04 15:59:10 201604003
-70.92 41.64 2016-06-04 17:06:11 2016-06-04 17:06:11 201604004
-70.92 41.64 2016-06-04 17:30:12 2016-06-04 17:30:12 201604005