删除R中的空间异常值(纬度和长坐标)

时间:2014-06-26 19:52:38

标签: r list latitude-longitude distance

我已尽力阅读此内容,我认为我发现了最适合的流程,但如果其他人有任何想法或任何功能或不同的方法,我们将不胜感激。所以我有一个不同行长的小数据帧列表,每个数据帧在不同的列中包含几个纬度和经度坐标。对于列表中的每个项目,我需要删除一个可能是异常值的坐标对,然后找到剩余坐标的平均中心(因此最后列表中的每个项目应该有一个坐标对。

我读过这样做的方法是分别找到所有纬度和长度的平均中心,然后计算从该平均中心到每个坐标对的欧氏距离,并去除超过a的点。期望的距离(比方说100米)。然后最终计算剩余点的平均中心作为最终结果。这对我来说似乎有点令人费解,所以再次,如果有人对协调异常值去除有任何建议,那可能会更好。

这是我到目前为止的一些代码:

dfList <- structure(list(`43` = structure(list(date = c("43 2011-04-06", "43 2011-04-07", "43 2011-04-08"), identifier = c(43, 43, 43), lon = c(-117.23041303, -117.23040817, -117.23039471), lat = c(32.81217294, 32.81218158, 32.81218645)), .Names = c("date", "identifier", "lon", "lat"), row.names = 13:15, class = "data.frame"), `44` = structure(list(date = c("44 2011-04-06", "44 2011-04-07", "44 2011-04-08"), identifier = c(44, 44, 44), lon = c(-117.22864227, -117.22861559, -117.22862265), lat = c(32.81257756, 32.81257089, 32.81257197)), .Names = c("date", "identifier", "lon", "lat"), row.names = 19:21, class = "data.frame"), `46` = structure(list(date = c("46 2011-04-06", "46 2011-04-07", "46 2011-04-08", "46 2011-04-09", "46 2011-04-10", "46 2011-04-11"), identifier = c(46, 46, 46, 46, 46, 46), lon = c(-117.22992617, -117.2289396895, -117.22965116, -117.23003928, -117.229922602, -117.22969664), lat = c(32.81295118, 32.8128226975, 32.81317299, 32.81224457, 32.813018734, 32.81276993)), .Names = c("date", "identifier", "lon", "lat"), row.names = 25:30, class = "data.frame"), `47` = structure(list(date = c("47 2011-04-06", "47 2011-04-07"), identifier = c(47, 47), lon = c(-117.2274484, -117.22747116), lat = c(32.81205838, 32.81207607)), .Names = c("date", "identifier", "lon", "lat"), row.names = 31:32, class = "data.frame")), .Names = c("43", "44", "46", "47"))

lonMean <- lapply(dfList, function(x) mean(x$lon)) #taking mean for longs
latMean <- lapply(dfList, function(x) mean(x$lat)) #taking mean for lats
latLon <- mapply(c, lonMean, latMean, SIMPLIFY=FALSE)#combining coord lists into one

编辑:所以我现在需要的是创建第一个列表中每个项目的所有坐标与第二个列表中匹配的平均坐标之间的距离,并从第一个列表中删除距离更远的任何点我以前使用过dist和geodist(来自'gmt')包,但是我不确定如何在这两个列表中使用它们。然后进一步放弃可能的异常值。非常感谢您的帮助,我不是最精明的人,所以任何帮助都非常感谢!

1 个答案:

答案 0 :(得分:3)

试试这个。

df <- do.call("rbind", dfList) # Flattens list into data frame, preserving 
                               # group identifier

# This function calculates distance in kilometers between two points
earth.dist <- function (long1, lat1, long2, lat2)
{
rad <- pi/180
a1 <- lat1 * rad
a2 <- long1 * rad
b1 <- lat2 * rad
b2 <- long2 * rad
dlon <- b2 - a2
dlat <- b1 - a1
a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c <- 2 * atan2(sqrt(a), sqrt(1 - a))
R <- 6378.145
d <- R * c
return(d)
}

df$dist <- earth.dist(df$lon, df$lat, mean(df$lon), mean(df$lat))

df[df$dist >= 0.1,] # Filter those above 100m