我有一个数据框,其中包含有关驱动程序及其遵循的路径的数据。我想弄清楚旅行的总里程数。我正在使用geosphere
软件包,但无法确定应用它的正确方法并以英里为单位获得答案。
> head(df1)
id routeDateTime driverId lat lon
1 1 2012-11-12 02:08:41 123 76.57169 -110.8070
2 2 2012-11-12 02:09:41 123 76.44325 -110.7525
3 3 2012-11-12 02:10:41 123 76.90897 -110.8613
4 4 2012-11-12 03:18:41 123 76.11152 -110.2037
5 5 2012-11-12 03:19:41 123 76.29013 -110.3838
6 6 2012-11-12 03:20:41 123 76.15544 -110.4506
到目前为止我已经尝试了
spDists(cbind(df1$lon,df1$lat))
和其他几个功能但似乎无法得到合理的答案。
有什么建议吗?
> dput(df1)
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), routeDateTime = c("2012-11-12 02:08:41",
"2012-11-12 02:09:41", "2012-11-12 02:10:41", "2012-11-12 03:18:41",
"2012-11-12 03:19:41", "2012-11-12 03:20:41", "2012-11-12 03:21:41",
"2012-11-12 12:08:41", "2012-11-12 12:09:41", "2012-11-12 12:10:41",
"2012-11-12 02:08:41", "2012-11-12 02:09:41", "2012-11-12 02:10:41",
"2012-11-12 03:18:41", "2012-11-12 03:19:41", "2012-11-12 03:20:41",
"2012-11-12 03:21:41", "2012-11-12 12:08:41", "2012-11-12 12:09:41",
"2012-11-12 12:10:41", "2012-11-12 02:08:41", "2012-11-12 02:09:41",
"2012-11-12 02:10:41", "2012-11-12 03:18:41", "2012-11-12 03:19:41",
"2012-11-12 03:20:41", "2012-11-12 03:21:41", "2012-11-12 12:08:41",
"2012-11-12 12:09:41", "2012-11-12 12:10:41", "2012-11-12 02:08:41",
"2012-11-12 02:09:41", "2012-11-12 02:10:41", "2012-11-12 03:18:41",
"2012-11-12 03:19:41", "2012-11-12 03:20:41", "2012-11-12 03:21:41",
"2012-11-12 12:08:41", "2012-11-12 12:09:41", "2012-11-12 12:10:41"
), driverId = c(123, 123, 123, 123, 123, 123, 123, 123, 123,
123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 789, 789,
789, 789, 789, 789, 789, 789, 789, 789, 246, 246, 246, 246, 246,
246, 246, 246, 246, 246), lat = c(76.5716897079255, 76.4432530414779,
76.9089707506355, 76.1115217276383, 76.2901271982118, 76.155437662499,
76.4115052509587, 76.8397977722343, 76.3357809444424, 76.032417796785,
76.5716897079255, 76.4432530414779, 76.9089707506355, 76.1115217276383,
76.2901271982118, 76.155437662499, 76.4115052509587, 76.8397977722343,
76.3357809444424, 76.032417796785, 76.5716897079255, 76.4432530414779,
76.9089707506355, 76.1115217276383, 76.2901271982118, 76.155437662499,
76.4115052509587, 76.8397977722343, 76.3357809444424, 76.032417796785,
76.5716897079255, 76.4432530414779, 76.9089707506355, 76.1115217276383,
76.2901271982118, 76.155437662499, 76.4115052509587, 76.8397977722343,
76.3357809444424, 76.032417796785), lon = c(-110.80701574916,
-110.75247172825, -110.861284852726, -110.203674311982, -110.383751512505,
-110.450569844106, -110.22185564111, -110.556956546381, -110.24483308522,
-110.217355202651, -110.80701574916, -110.75247172825, -110.861284852726,
-110.203674311982, -110.383751512505, -110.450569844106, -110.22185564111,
-110.556956546381, -110.24483308522, -110.217355202651, -110.80701574916,
-110.75247172825, -110.861284852726, -110.203674311982, -110.383751512505,
-110.450569844106, -110.22185564111, -110.556956546381, -110.24483308522,
-110.217355202651, -110.80701574916, -110.75247172825, -110.861284852726,
-110.203674311982, -110.383751512505, -110.450569844106, -110.22185564111,
-110.556956546381, -110.24483308522, -110.217355202651)), .Names = c("id",
"routeDateTime", "driverId", "lat", "lon"), row.names = c(NA,
-40L), class = "data.frame")
答案 0 :(得分:6)
这个怎么样?
## Setup
library(geosphere)
metersPerMile <- 1609.34
pts <- df1[c("lon", "lat")]
## Pass in two derived data.frames that are lagged by one point
segDists <- distVincentyEllipsoid(p1 = pts[-nrow(df),],
p2 = pts[-1,])
sum(segDists)/metersPerMile
# [1] 1013.919
(要使用其中一种更快的距离计算算法,只需将distCosine
,distVincentySphere
或distHaversine
替换为上述调用中的distVincentyEllipsoid
。)
答案 1 :(得分:1)
在丢失数据时要非常小心,因为distVincentyEllipsoid()为缺少坐标c(NA,NA),c(NA,NA)的任意两点之间的距离返回0。
答案 2 :(得分:0)
library(geodist)
geodist(df, sequential = TRUE, measure = "geodesic") # sequence of distance increments
sum(geodist(df, sequential = TRUE, measure = "geodesic")) # total distance in metres
sum(geodist(df, sequential = TRUE, measure = "geodesic")) * 0.00062137 # total distance in miles
大地距离在这里是必需的,因为涉及的距离很长。结果为1013.915,与geosphere
的不准确的Vincenty距离略有不同。街道网络距离也可以使用
library(dodgr)
dodgr_dists(from = df)
...但是必须有一个街道网络,(lat = 76,lon = -110)不是这种情况。在有街道网络的地方,默认情况下,它将为您提供通过街道网络路由的所有成对距离,从中开始,连续的增量为非对角线。