我在一个数据帧和另一个数据帧中有GPX数据我正在调用info_access以及我想与GPX数据“合并”的其他信息。数据框之间没有共同的变量。我想在第二个数据帧(info_access)中使用TowStartDate,InclinometerStart(时间),TowEndDate和InclinometerEnd(时间)列来标识GPX数据帧中日期/时间在TowStartDate和&之间的行。 InclinometerStart和TowEndDate& InclinometerEnd。然后,我想从第二个数据帧中分配那些GPX乘以Tow值。 GPX数据集很大,所以我遇到的问题与我最初尝试这样做的方式有关。
示例GPX数据:
example_gpx<-data.frame(Long=c(-70.92108,-70.92108,-70.92108, -70.92108, -70.92108 ),
Lat=c(41.64437,41.64437,41.64437 ,41.64437,41.64437),
Date=c("2016-06-04","2016-06-04","2016-06-04","2016-06-04","2016-06-04"),
Time=c("19:15:08","19:15:09","19:15:10","19:15:11","19:15:12"))
示例info_access
example_access<-structure(list(Tow = 201604001:201604005, TowStartDate = structure(c(1465012800,
1465012800, 1465012800, 1465012800, 1465012800), class = c("POSIXct",
"POSIXt"), tzone = ""), TowEndDate = structure(c(1465012800,
1465012800, 1465012800, 1465012800, 1465012800), class = c("POSIXct",
"POSIXt"), tzone = ""), InclinometerStart = c("14:06:00", "15:05:00",
"15:51:20", "16:52:10", "17:27:50"), InclinometerEnd = c("14:22:10",
"15:20:20", "16:06:20", "17:07:00", "17:43:00"), date_time_start = c("2016-06-04 14:06:00",
"2016-06-04 15:05:00", "2016-06-04 15:51:20", "2016-06-04 16:52:10",
"2016-06-04 17:27:50"), date_time_end = c("2016-06-04 14:22:10",
"2016-06-04 15:20:20", "2016-06-04 16:06:20", "2016-06-04 17:07:00",
"2016-06-04 17:43:00")), .Names = c("Tow", "TowStartDate", "TowEndDate",
"InclinometerStart", "InclinometerEnd", "date_time_start", "date_time_end"
), row.names = 181:185, class = "data.frame")
我一直在尝试使用expand.grid
创建一个数据集,其中包含来自信息访问数据集的所有Tow数和来自gpx数据集的所有组合。我遇到了内存大小问题,因为我的原始数据集很大。
用于在example_access中的example_gpx中标识日期/时间的示例代码:
#use expand.grid function
Tow<-unique( example_access$Tow)
Time<-example_gpx$Time
a<-expand.grid(Tow,Time)
names(a)<-c("Tow","Time")
head(a)
b<-merge(a,example_gpx,"Time")
head(b)
length(b[,1])
c<-merge(b,example_access,by="Tow")
head(c)
str(c)
length(b[,1])
head(c)
Tow Time Long Lat Date TowStartDate TowEndDate
1 201604001 19:15:09 -70.92108 41.64437 2016-06-04 2016-06-04 2016-06-04
2 201604001 19:15:08 -70.92108 41.64437 2016-06-04 2016-06-04 2016-06-04
3 201604001 19:15:10 -70.92108 41.64437 2016-06-04 2016-06-04 2016-06-04
4 201604001 19:15:12 -70.92108 41.64437 2016-06-04 2016-06-04 2016-06-04
5 201604001 19:15:11 -70.92108 41.64437 2016-06-04 2016-06-04 2016-06-04
6 201604002 19:15:12 -70.92108 41.64437 2016-06-04 2016-06-04 2016-06-04
InclinometerStart InclinometerEnd date_time_start date_time_end
1 14:06:00 14:22:10 2016-06-04 14:06:00 2016-06-04 14:22:10
2 14:06:00 14:22:10 2016-06-04 14:06:00 2016-06-04 14:22:10
3 14:06:00 14:22:10 2016-06-04 14:06:00 2016-06-04 14:22:10
4 14:06:00 14:22:10 2016-06-04 14:06:00 2016-06-04 14:22:10
5 14:06:00 14:22:10 2016-06-04 14:06:00 2016-06-04 14:22:10
6 15:05:00 15:20:20 2016-06-04 15:05:00 2016-06-04 15:20:20
这适用于我的小例子,但如果有人对如何使用带有88,287个观察结果的GPX文件进行此操作的想法,那么它将会有所帮助或更优雅的解决方案。
sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.4.3 gmt_1.2-0 RODBC_1.3-12 lubridate_1.5.0
[5] rgdal_1.0-6 maptools_0.8-36 sp_1.1-1
loaded via a namespace (and not attached):
[1] Rcpp_0.11.6 lattice_0.20-31 assertthat_0.1 grid_3.2.1
[5] R6_2.1.0 DBI_0.3.1 magrittr_1.5 stringi_0.5-5
[9] tools_3.2.1 stringr_1.0.0 foreign_0.8-63 parallel_3.2.1
>
答案 0 :(得分:0)
这是我使用lubridate
包提出的解决方案。请注意,我更改了您的示例数据,以便间隔排列,实际上可以将Tow值分配给GPX数据。 example_access
未更改。
library(lubridate)
example_gpx<-data.frame(Long=c(-70.92108,-70.92108,-70.92108, -70.92108, -70.92108 ),
Lat=c(41.64437,41.64437,41.64437 ,41.64437,41.64437),
Date=c("2016-06-04","2016-06-04","2016-06-04","2016-06-04","2016-06-04"),
Time=c("14:15:08","15:15:09","15:59:10","17:06:11","17:30:12"))
example_access$date_time_start<-ymd_hms(example_access$date_time_start)
example_access$date_time_end<-ymd_hms(example_access$date_time_end)
example_gpx$date_time<-paste(example_gpx$Date, example_gpx$Time)
example_gpx$date_time<-ymd_hms(example_gpx$date_time)
example_gpx$Tow<-sapply(1:nrow(example_gpx), function(x)
if(example_gpx[x,]$date_time %between%
c(example_access[x,]$date_time_start,example_access[x,]$date_time_end)) example_access[x,]$Tow
else NA)
example_gpx
Long Lat Date Time date_time Tow
-70.92 41.64 2016-06-04 14:15:08 2016-06-04 14:15:08 201604001
-70.92 41.64 2016-06-04 15:15:09 2016-06-04 15:15:09 201604002
-70.92 41.64 2016-06-04 15:59:10 2016-06-04 15:59:10 201604003
-70.92 41.64 2016-06-04 17:06:11 2016-06-04 17:06:11 201604004
-70.92 41.64 2016-06-04 17:30:12 2016-06-04 17:30:12 201604005
如果数据丢失或混乱,此方法可能有效,也可能无效。
编辑:只有当您的两个数据集相互排列时,上述方法才有效,几乎可以肯定不是。下面的代码无论两组是否排成一列都会起作用,但是,它可能会慢一点,但我认为稳健性是值得的:
library(lubridate)
library(data.table)
example_gpx<-data.frame(Long=c(-70.92108,-70.92108,-70.92108, -70.92108, -70.92108 ),
Lat=c(41.64437,41.64437,41.64437 ,41.64437,41.64437),
Date=c("2016-06-04","2016-06-04","2016-06-04","2016-06-04","2016-06-04"),
Time=c("14:15:08","15:15:09","15:59:10","17:06:11","17:30:12"))
example_access$date_time_start<-ymd_hms(example_access$date_time_start)
example_access$date_time_end<-ymd_hms(example_access$date_time_end)
example_gpx$date_time<-paste(example_gpx$Date, example_gpx$Time)
example_gpx$date_time<-ymd_hms(example_gpx$date_time)
example_gpx$Tow<-NA
for(x in 1:nrow(example_access)){
example_gpx[which(example_gpx$date_time %between%
c(example_access[x,]$date_time_start,
example_access[x,]$date_time_end)),]$Tow<-example_access[x,]$Tow
}
example_gpx
Long Lat Date Time date_time Tow
-70.92 41.64 2016-06-04 14:15:08 2016-06-04 14:15:08 201604001
-70.92 41.64 2016-06-04 15:15:09 2016-06-04 15:15:09 201604002
-70.92 41.64 2016-06-04 15:59:10 2016-06-04 15:59:10 201604003
-70.92 41.64 2016-06-04 17:06:11 2016-06-04 17:06:11 201604004
-70.92 41.64 2016-06-04 17:30:12 2016-06-04 17:30:12 201604005