在数据框中获取Lat和Longitude的质心

时间:2016-08-01 13:08:01

标签: r dataframe

我有一个数据框(df)有三列喜欢这样:(所有数字随机)

ID  Lat    Lon
1   25.32 -63.32
1   25.29 -64.21
1   24.12 -62.43
2   12.42  54.64
2   12.11  53.43
.   ....   ....

基本上我想要像每个ID那样使用质心:

ID  Lat    Lon    Cent_lat   Cent_lon
1   25.32 -63.32  25.31      -63.25
1   25.29 -64.21  25.31      -63.25
1   24.12 -62.43  25.31      -63.25
2   12.42  54.64  12.20       53.60
2   12.11  53.43  12.20       53.60

我厌倦了以下事情:

library(geosphere)
library(rgeos)
library(dplyr)

df1 <- by(df,df$ID,centroid(df$Lat, df$Long))

但这给了我这个错误:

  

(函数(classes,fdef,mtable)中的错误:           无法为签名'&#34;数字&#34;'

找到函数'centroid'的继承方法

我甚至累了

df1 <- by(df,df$ID,centroid(as.numeric(df$Lat), as.numeric(df$Long)))

但这给了我这个错误:

  

(函数(classes,fdef,mtable)中的错误:     无法为签名'&#34;函数&#34;'找到函数'centroid'的继承方法''

5 个答案:

答案 0 :(得分:2)

library(geosphere)
library(ggplot2)
library(dplyr)

states <- map_data("state")

head(states)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>

cntrd <- function(x) {
  data.frame(centroid(as.matrix(x[,c("long", "lat")])))
}

by(states, states$group, cntrd) %>% head()
## $`1`
##         lon      lat
## 1 -86.82976 32.82735
## 
## $`2`
##         lon      lat
## 1 -111.6698 34.34309
## 
## $`3`
##         lon      lat
## 1 -92.43826 34.92167
## 
## $`4`
##         lon      lat
## 1 -119.6713 37.40289
## 
## $`5`
##         lon      lat
## 1 -105.5526 39.02653
## 
## $`6`
##         lon      lat
## 1 -72.72553 41.62706

group_by(states, group) %>%
  do(cntrd(.))
## Source: local data frame [63 x 3]
## Groups: group [63]
## 
##    group        lon      lat
##    <dbl>      <dbl>    <dbl>
## 1      1  -86.82976 32.82735
## 2      2 -111.66978 34.34309
## 3      3  -92.43826 34.92167
## 4      4 -119.67130 37.40289
## 5      5 -105.55264 39.02653
## 6      6  -72.72553 41.62706
## 7      7  -75.51543 39.00879
## 8      8  -77.03411 38.91083
## 9      9  -82.51260 28.69498
## 10    10  -83.46361 32.67562
## # ... with 53 more rows

答案 1 :(得分:2)

这是一个data.table方法。正如@czeinerb所提到的,Lon是质心函数的第一个参数,Lat是第二个。我们重新定义下面的centroid函数,以便在data.table聚合中,它接收一个包含2列(Lat | Lon)的矩阵,这是geosphere的centroid函数所需的输入。

# Import packages
library(geosphere)
library(data.table) # Using a data.table approach

# Sample data
df = data.frame("ID" = c(1, 1, 1, 2, 2, 2), "Lat" = c(25.32, 25.29, 24.12, 12.42, 12.11, 12.22), "Lon" = c(-63.32, -64.21, -62.43, 54.64, 53.43, 53.23))

df

  ID   Lat    Lon
1  1 25.32 -63.32
2  1 25.29 -64.21
3  1 24.12 -62.43
4  2 12.42  54.64
5  2 12.11  53.43
6  2 12.22  53.23

# Convert to data.table
setDT(df)

# Re-define centroid function - Lon is first argument and Lat is second
# Geosphere takes a matrix with two columns: Lon|Lat, so we use cbind to coerce the data to this form
findCentroid <- function(Lon, Lat, ...){
  centroid(cbind(Lon, Lat), ...)
}

# Find centroid Lon and Lat by ID, as required
df[, c("Cent_lon", "Cent_lat") := as.list(findCentroid(Lon, Lat)), by = ID]
df

   ID   Lat    Lon  Cent_lon Cent_lat
1:  1 25.32 -63.32 -63.32000 24.91126
2:  1 25.29 -64.21 -63.32000 24.91126
3:  1 24.12 -62.43 -63.32000 24.91126
4:  2 12.42  54.64  53.76667 12.25003
5:  2 12.11  53.43  53.76667 12.25003
6:  2 12.22  53.23  53.76667 12.25003

答案 2 :(得分:1)

要使用centroid,您需要按顺序使用经度和纬度的多边形。见这个例子:

df<-rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20),
c(-100,-50), c(-160,-60), c(-180, -10), c(-160,10), c(-60,0),c(-100,-50))
df<-data.frame(ID=rep(c(1,2),times=c(5,6)),Lon=df[,1],Lat=df[,2])
df1 <- by(df[,c("Lon", "Lat")],df$ID,centroid)
df1
df[,c("Cent_lon","Cent_lat")]<-NA
for(i in names(df1))df[df$ID==i,c("Cent_lat","Cent_lon")]<-df1[[i]]
df

   ID  Lon Lat   Cent_lon   Cent_lat
1   1 -180 -20  -23.89340 -133.33333
2   1 -160   5 -133.33333  -23.89340
3   1  -60   0  -23.89340 -133.33333
4   1 -160 -60 -133.33333  -23.89340
5   1 -180 -20  -23.89340 -133.33333
6   2 -100 -50 -127.66065 -127.66065
7   2 -160 -60  -26.10686  -26.10686
8   2 -180 -10 -127.66065 -127.66065
9   2 -160  10  -26.10686  -26.10686
10  2  -60   0 -127.66065 -127.66065
11  2 -100 -50  -26.10686  -26.10686

您可以使用plotArrows查看多边形

pol<-split(df[,2:3],df$ID)
#plotArrows(pol[[1]])
plotArrows(as.matrix(pol[[1]]))
points(df1[[1]],col=4)

enter image description here

答案 3 :(得分:1)

centroid包的函数geosphere矩阵作为数据参数:&#34;参数:xa 2列矩阵(经度/纬度) &#34;

https://cran.r-project.org/web/packages/geosphere/geosphere.pdf

此外,经度是第一列,纬度是第二列,而不是相反:)

因此,您案例中的代码可能如下:

library(geosphere)

df <- data.frame(ID = c(1,1,1,2,2,2,2)
                , Lon = c(-63.32, -64.43, -62.43, 54.64, 53.43, 54.64, 53.43)
                , Lat = c(25.32, 25.29, 24.12, 12.42, 12.11, 11.11, 10.55))
mx <- as.matrix(df)

(mx1 <- by(mx[,2:3], mx[,1], centroid))

输出:

> INDICES: 1
> lon      lat
> [1,] -63.39333 24.91126
> ----------------------------------------------------------------- 
> INDICES: 2
> lon lat
> [1,] Inf  90

答案 4 :(得分:0)

?centroid它说它只需要一个2列矩阵作为其参数。您拥有的ID信息使矩阵成为三列。

df <- rbind(c(25.32,-63.32),c(25.29,-64.32),c(24.12,-62.43),c(12.42,54.64),c(12.11,53.43) centroid(df)

  lon       lat
[1,] 24.27109 -60.37098