r中同时点位置之间的距离

时间:2013-03-26 20:05:03

标签: r datetime distance sequential

我正在计算“同时”记录的UTM位置之间的距离(以米为单位),但我遇到了问题。它现在写的方式我只计算“最接近时间”的1个人之间的距离。我希望它能够计算出时间“接近”的所有个体之间的距离。

在我的例子中,我有3只驼鹿个体和3只狼。我想取驼鹿1并计算狼1的同时记录位置之间的距离然后狼2然后狼3.现在脚本只搜索任何狼之间的绝对最小时差并计算1狼的距离而不是所有其他人。

这是我的测试数据:

驼鹿位置数据:

structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("F07001", 
"F07010", "M07012"), class = "factor"), x = c(1482445L, 1481274L, 
1481279L, 1481271L, 1480849L, 1480881L, 1480883L, 1480880L, 1482448L, 
1482494L, 1482534L, 1482534L, 1482553L, 1482555L, 1482414L, 1482852L, 
1476120L, 1476104L, 1476101L), y = c(6621768L, 6619628L, 6619630L, 
6619700L, 6620321L, 6620427L, 6620438L, 6620423L, 6616403L, 6616408L, 
6616395L, 6616408L, 6616406L, 6616418L, 6616755L, 6616312L, 6623655L, 
6623646L, 6623652L), date = structure(c(1173088800, 1173096000, 
1173103260, 1173110400, 1173117600, 1173211200, 1173218400, 1173139200, 
1173088800, 1173096000, 1173103260, 1173110400, 1173117600, 1173211200, 
1173218400, 1173139200, 1173270600, 1173277800, 1173282960), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), .Names = c("id", "x", "y", "date"
), row.names = c(NA, -19L), class = "data.frame")

狼位置数据:

structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("HF7572", 
"Htest", "UM1347"), class = "factor"), x = c(1480610L, 1480640L, 
1480613L, 1480613L, 1480555L, 1480567L, 1480627L, 1480532L, 1480593L, 
1484394L, 1484394L, 1483940L, 1483933L, 1483935L, 1483930L, 1483855L, 
1483793L, 1483802L, 1484392L, 1483855L), y = c(6619853L, 6619739L, 
6619759L, 6619862L, 6619838L, 6619772L, 6619902L, 6619899L, 6619887L, 
6619589L, 6619602L, 6619899L, 6619907L, 6619905L, 6619896L, 6619834L, 
6619702L, 6619672L, 6619558L, 6619834L), date = structure(c(1173088800, 
1173096060, 1173103440, 1173111600, 1173117780, 1173213600, 1173218400, 
1173141120, 1173266100, 1173095940, 1173099600, 1173103200, 1173106920, 
1173110400, 1173208800, 1173211200, 1173222000, 1173266100, 1173362100, 
1173211200), class = c("POSIXct", "POSIXt"), tzone = "UTC")), .Names = c("id", 
"x", "y", "date"), row.names = c(NA, -20L), class = "data.frame")

到目前为止,这是我的脚本:

mloc=read.csv("moose.csv", head = T)
wloc=read.csv("wolf.csv", head=T)
mloc$date<-as.POSIXct(strptime(mloc$date,"%Y-%m-%d %H:%M"),tz="UTC")
wloc$date<-as.POSIXct(strptime(wloc$date,"%Y-%m-%d %H:%M"),tz="UTC")

#sort the data sequentially by date time then convert to number
Sortmoose = mloc[order(mloc$date),]
Sortwolf = wloc[order(wloc$date),]
m <- as.numeric(Sortmoose$date)
w <- as.numeric(Sortwolf$date)

#Creates index of the time intervals
id <- findInterval(m, w, all.inside=TRUE)
id_min <- ifelse(abs(m-w[id])<abs(m-w[id+1]), id, id+1)
Sortmoose$wolfID = Sortwolf$id[id_min]
Sortmoose$wolfdate =Sortwolf$date[id_min]
Sortmoose$wolfx = Sortwolf$x[id_min]
Sortmoose$wolfy = Sortwolf$y[id_min]
Sortmoose$dist= sqrt((Sortmoose$wolfx-Sortmoose$x)^2+(Sortmoose$wolfy-Sortmoose$y)^2)

我想计算每只驼鹿/狼对之间的距离,只要该位置是在“相同”时间记录的。我希望输出具有驼鹿信息和相关的狼信息以及这两点之间的距离(以米为单位)。我也想要时间差,所以我可以过滤那些> 45分钟或类似的东西,但这是我认为我以后可以做的事情。基本上是这样的: mooseID mooseDate mooseX mooseY wolfID wolfDate wolfX wolfY距离(m)TimeDiff(min)

2 个答案:

答案 0 :(得分:2)

新解决方案。这是执行您想要的代码(近似匹配)。关键的想法是创建一个包含新列date1的新数据表,以便原始数据中的每个date = 05:17:13都有date1 = 04:00:0005:00:00和{{1} (和所有其他列重复)然后对这个新列进行合并。这将保证原始数据中彼此在一小时内的每两个事件将被合并。

之后我们只计算距离和时差。

请注意,使用06:00:00对速度至关重要,因为您的数据框架太大了 - 使用常规data.table会太慢。

data.frame

结果包含1小时内的所有事件(有时在2小时内,但您可以轻松过滤掉这些事件)。

library(data.table)
library(lubridate)

mloc <- data.table(mloc)
wloc <- data.table(wloc)

# Returns a new data table with one new column (date1) and length(range)
# rows for each row in the initial data table, duplicating all other fields.
# Example: for row with date = '2013-01-15 05:17:23' and for the default range
# argument it will add rows with date1 = '2013-01-15 04:00:00', '2013-01-15 05:00:00'
# and '2013-01-15 06:00:00'
AddTimeBoundaries <- function(dt, range = -1:1) {
  dt1 <- rbindlist(lapply(range, 
             function(x) data.table(id = dt$id, date = dt$date, 
                        date1 = floor_date(dt$date, 'hour') +
                        hours(x))))
  setkey(dt1, id, date)
  setkey(dt, id, date)
  result <- dt[dt1]
  setkey(result, date1)
  result
}

mloc.1 <- AddTimeBoundaries(mloc)
wloc.1 <- AddTimeBoundaries(wloc)

x <- mloc.1[wloc.1, allow.cartesian = TRUE][!is.na(id)]
result <- unique(x[, list(id, date, x, y, id.1, date.1, x.1, y.1, 
              distance = sqrt((x-x.1)^2 + (y-y.1)^2),
              time.diff = date - date.1)])

旧解决方案这不起作用,因为OP需要大致匹配日期(在1小时内),而不是确切的。

假设我正确解释了您的问题,这里是使用> head(result, 10) id date x y id.1 date.1 x.1 y.1 distance time.diff 1: F07001 2007-03-05 10:00:00 1482445 6621768 HF7572 2007-03-05 10:00:00 1480610 6619853 2652.2538 0 secs 2: M07012 2007-03-05 10:00:00 1482448 6616403 HF7572 2007-03-05 10:00:00 1480610 6619853 3909.0592 0 secs 3: F07001 2007-03-05 10:00:00 1482445 6621768 UM1347 2007-03-05 11:59:00 1484394 6619589 2923.4640 -7140 secs 4: M07012 2007-03-05 10:00:00 1482448 6616403 UM1347 2007-03-05 11:59:00 1484394 6619589 3733.2977 -7140 secs 5: F07001 2007-03-05 12:00:00 1481274 6619628 HF7572 2007-03-05 10:00:00 1480610 6619853 701.0856 7200 secs 6: M07012 2007-03-05 12:00:00 1482494 6616408 HF7572 2007-03-05 10:00:00 1480610 6619853 3926.5100 7200 secs 7: F07001 2007-03-05 10:00:00 1482445 6621768 HF7572 2007-03-05 12:01:00 1480640 6619739 2715.6705 -7260 secs 8: F07001 2007-03-05 12:00:00 1481274 6619628 HF7572 2007-03-05 12:01:00 1480640 6619739 643.6435 -60 secs 9: M07012 2007-03-05 10:00:00 1482448 6616403 HF7572 2007-03-05 12:01:00 1480640 6619739 3794.4380 -7260 secs 10: M07012 2007-03-05 12:00:00 1482494 6616408 HF7572 2007-03-05 12:01:00 1480640 6619739 3812.2011 -60 secs 包的解决方案。我在测试数据data.table和第二个mloc中调用了第一个结构。

步骤1.将两个数据框转换为wloc并在data.table上设置密钥:

date

步骤2.按library(data.table) mloc <- data.table(mloc) wloc <- data.table(wloc) setkey(mloc, date) setkey(wloc, date) 键合并两个表格,创建“笛卡尔积”并计算距离:

date

答案 1 :(得分:1)

我认为我有一个部分解决方案,它可以让你根据需要修改“亲密度”窗口。

# Convert to data.table:
mloc<-as.data.table(mloc)
wloc<-as.data.table(wloc)

# Rename columns to make them less ambiguous:
setnames(mloc,paste0("m",names(mloc)))
setnames(wloc,paste0("w",names(wloc)))

# Adjustable rounding factor:
r <- 45 /60/24 # Need to convert minutes to days

# Add the rounded date column to the two tables:
mloc[,rdate:=round(as.numeric(mdate-as.POSIXct("1970-01-01", tz="GMT"))/r)*r*60*60*24+as.POSIXct("1970-01-01", tz="GMT")]
wloc[,rdate:=round(as.numeric(wdate-as.POSIXct("1970-01-01", tz="GMT"))/r)*r*60*60*24+as.POSIXct("1970-01-01", tz="GMT")]

# Set the keys:
setkey(mloc,rdate)
setkey(wloc,rdate)

# Join the wolf and moose tables on the rounded date:
wloc[mloc, allow.cartesian=T,nomatch=0]

##                  rdate    wid      wx      wy               wdate    mid      mx      my               mdate
## 1: 2007-03-05 09:45:00 HF7572 1480610 6619853 2007-03-05 10:00:00 F07001 1482445 6621768 2007-03-05 10:00:00
## 2: 2007-03-05 09:45:00 HF7572 1480610 6619853 2007-03-05 10:00:00 M07012 1482448 6616403 2007-03-05 10:00:00
## 3: 2007-03-05 12:00:00 UM1347 1484394 6619589 2007-03-05 11:59:00 F07001 1481274 6619628 2007-03-05 12:00:00
## 4: 2007-03-05 12:00:00 HF7572 1480640 6619739 2007-03-05 12:01:00 F07001 1481274 6619628 2007-03-05 12:00:00
## 5: 2007-03-05 12:00:00 UM1347 1484394 6619589 2007-03-05 11:59:00 M07012 1482494 6616408 2007-03-05 12:00:00
## 6: 2007-03-05 12:00:00 HF7572 1480640 6619739 2007-03-05 12:01:00 M07012 1482494 6616408 2007-03-05 12:00:00
## 7: 2007-03-05 14:15:00 UM1347 1483940 6619899 2007-03-05 14:00:00 F07001 1481279 6619630 2007-03-05 14:01:00
## 8: 2007-03-05 14:15:00 HF7572 1480613 6619759 2007-03-05 14:04:00 F07001 1481279 6619630 2007-03-05 14:01:00
## 9: 2007-03-05 14:15:00 UM1347 1483940 6619899 2007-03-05 14:00:00 M07012 1482534 6616395 2007-03-05 14:01:00
##10: 2007-03-05 14:15:00 HF7572 1480613 6619759 2007-03-05 14:04:00 M07012 1482534 6616395 2007-03-05 14:01:00
##11: 2007-03-05 15:45:00 UM1347 1483935 6619905 2007-03-05 16:00:00 F07001 1481271 6619700 2007-03-05 16:00:00
##12: 2007-03-05 15:45:00 UM1347 1483935 6619905 2007-03-05 16:00:00 M07012 1482534 6616408 2007-03-05 16:00:00
##13: 2007-03-05 18:00:00 HF7572 1480555 6619838 2007-03-05 18:03:00 F07001 1480849 6620321 2007-03-05 18:00:00
##14: 2007-03-05 18:00:00 HF7572 1480555 6619838 2007-03-05 18:03:00 M07012 1482553 6616406 2007-03-05 18:00:00
##15: 2007-03-06 20:15:00 UM1347 1483855 6619834 2007-03-06 20:00:00 F07001 1480881 6620427 2007-03-06 20:00:00
##16: 2007-03-06 20:15:00  Htest 1483855 6619834 2007-03-06 20:00:00 F07001 1480881 6620427 2007-03-06 20:00:00
##17: 2007-03-06 20:15:00 UM1347 1483855 6619834 2007-03-06 20:00:00 M07012 1482555 6616418 2007-03-06 20:00:00
##18: 2007-03-06 20:15:00  Htest 1483855 6619834 2007-03-06 20:00:00 M07012 1482555 6616418 2007-03-06 20:00:00
##19: 2007-03-06 21:45:00 HF7572 1480627 6619902 2007-03-06 22:00:00 F07001 1480883 6620438 2007-03-06 22:00:00
##20: 2007-03-06 21:45:00 HF7572 1480627 6619902 2007-03-06 22:00:00 M07012 1482414 6616755 2007-03-06 22:00:00

我说这是偏袒的,因为当一个值被四舍五入而另一个值下降时,它会错过近似匹配。例如,wdate的{​​{1}}向上舍入为2007-03-05 16:20:002007-03-05 16:30:00的{​​{1}}向下舍入为mdate,因此无法匹配在连接中,即使这两个事件仅相隔20分钟,窗口是45分钟。

我有另一个部分2007-03-05 16:00:00解决方案,它不会围绕,而是使用2007-03-05 15:45:00data.table(两个结果然后roll=-45*60编辑在一起)。它拿起了这个示例记录,但看起来还有一些我需要调查的问题......