我在R中使用 geosphere 库来计算两个国家之间的半径距离。我使用以下代码使用geonames API获取每个国家/地区的纬度/经度,然后将它们传递到 distHaversine 函数以获取距离:
for(i in 1:nrow(df)) {
row<-df[i,]
start<-GNsearch(q=row$Address.Country, maxRows=1, adminCode1='00')
start_lng<-as.numeric(start$lng)
start_lat<-as.numeric(start$lat)
end<-GNsearch(q=row$Country.Value, maxRows=1, adminCode1='00')
end_lng<-as.numeric(end$lng)
end_lat<-as.numeric(end$lat)
print(i)
print(c(row$Address.Country, start_lng, start_lat))
print(c(row$Country.Value, end_lng, end_lat))
df[i,]$dist<-distHaversine( p1=c(start_lng, start_lat), p2=c(end_lng, end_lat) )
print(df[i,]$dist)
}
然而,函数 distHaversine 正在返回 NULL ,我得到的输出看起来像:
[1] 224
[1] "United States" "-98.5" "39.76"
[1] "Mexico" "-102" "23"
NULL
我哪里错了?
更新
我认为问题是尽管使用 as.numeric ,但长/纬线对没有转换为数字。我试着寻找解决方案,甚至尝试使用
as.numeric(as.character(end$lng))
...
但是,这似乎也不起作用!
更新2
作为对SymbolixAU的回应,以下是数据集的示例:
dput(head(df))
structure(list(Country.Value = c("United States", "Brasil",
"United States", "India", "China", "Denmark"), Address.Country = c("United States",
"United States", "United States", "Romania", "United States",
"France"), .Names = c("Country.Value", "Address.Country",
"Award.Date"), class = c("data.table", "data.frame"), row.names = c(NA,
-6L), .internal.selfref = <pointer: 0x00000000001f0788>)
更新3
我已经进一步更新了代码以尝试合并 Google Maps地理编码器而不是 geonames ,因为 geonames 在某些地方非常不准确。我还使用if-else语句尝试保持在免费的 Google Maps地理编码器设置的地理编码限制范围内。现在的代码如下:
for(i in q:nrow(df)) {
row<-df.cont.long[i,]
src_lon<- 0.0
src_lat<- 0.0
trgt_lon<- 0.0
trgt_lat<- 0.0
if((row$Country.Value=='United States')){ #Reduce geocoding requirements
trgt_lon<- -95.7129
trgt_lat<- 37.0902
}
else if((row$Address.Country=='United States')){ #Reduce Geocoding Requirements
src_lon<- -95.7129
src_lat<- 37.0902
}
else if((row$Country.Value=='Canada')){ #Reduce geocoding requirements
trgt_lon<- -106.3468
trgt_lat<- 56.1304
}
else if((row$Primary.Address.Country=='Canada')){ #Reduce Geocoding Requirements
src_lon<- -106.3468
src_lat<- 56.1304
}
else if(row$Country.Value == row$Address.Country){ #Reduce Geocoding Requirements
# trgt<-geocode(row$Country.Value)
# trgt_lon<-as.numeric(trgt$lon)
# trgt_lat<-as.numeric(trgt$lat)
# src_lon<-as.numeric(trgt$lon)
# src_lat<-as.numeric(trgt$lat)
}
else{
trgt<-geocode(row$Country.Value, output=c("latlon"))
trgt_lon<-as.numeric(trgt$lon)
trgt_lat<-as.numeric(trgt$lat)
src<-geocode(row$Address.Country)
src_lon<-as.numeric(src$lon)
src_lat<-as.numeric(src$lat)
}
print(i)
print(c(row$Address.Country, src_lon, src_lat))
print(c(row$Country.Value, trgt_lon, trgt_lat))
print(distHaversine( p1=c(as.numeric(src$lon), as.numeric(src$lat)), p2=c(as.numeric(trgt$lon), as.numeric(trgt$lat)) ))
}
现在输出更加不稳定了:
[1] 993
[1] "United States" "0" "0"
[1] "United States" "-95.7129" "37.0902"
[1] NA
[1] 994
[1] "United States" "-95.7129" "37.0902"
[1] "Brazil" "0" "0"
[1] NA
[1] 995
[1] "United States" "-95.7129" "37.0902"
[1] "Mexico" "0" "0"
[1] NA
[1] 996
[1] "United States" "-95.7129" "37.0902"
[1] "Bolivia" "0" "0"
[1] NA
[1] TRUE
[1] 997
[1] "South Africa" "0" "0"
[1] "South Africa" "0" "0"
[1] NA
[1] 998
[1] "United States" "-95.7129" "37.0902"
[1] "Costa Rica" "0" "0"
[1] NA
[1] 999
[1] "United States" "-95.7129" "37.0902"
[1] "Tanzania" "0" "0"
[1] NA
[1] 1000
[1] "United States" "-95.7129" "37.0902"
[1] "Greece" "0" "0"
[1] NA
[1] 1001
[1] "United States" "-95.7129" "37.0902"
[1] "Namibia" "0" "0"
[1] NA
.Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Colombia&sensor=false
.Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=United%20Kingdom&sensor=false
[1] 1002
[1] "United Kingdom" "-3.435973" "55.378051"
[1] "Colombia" "-74.297333" "4.570868"
[1] 8398813
[1] 1003
[1] "United States" "-95.7129" "37.0902"
[1] "Guatemala" "0" "0"
[1] 8398813
在输出中
我不知道代码出错的地方。