我有包含多边形的地址数据和shapefile,并尝试确定每个地址与每个多边形的最近距离(以英里为单位),然后创建一个嵌套字典,其中包含所有信息,格式为:
nested_dict = {poly_1: {address1: distance, address2 : distance},
poly2: {address1: distance, address2: distance}, etc}
我正在使用的完整适用代码是:
import pandas as pd
from shapely.geometry import mapping, Polygon, LinearRing, Point
import geopandas as gpd
from math import radians, cos, sin, asin, sqrt
address_dict = {k: [] for k in addresses_geo.input_string}
sludge_dtc = {k: [] for k in sf_geo.unique_name}
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 3956 # Radius of earth in miles. Use 6371 for kilometers
return c * r
# Here's the key loop that isn't working correctly
for unique_name, i in zip(sf_geo.unique_name, sf_geo.index):
for address, pt in zip(addresses_geo.input_string, addresses_geo.index):
pol_ext = LinearRing(sf_geo.iloc[i].geometry.exterior.coords)
d = pol_ext.project(addresses_geo.iloc[pt].geometry)
p = pol_ext.interpolate(d)
closest_point_coords = list(p.coords)[0]
# print(closest_point_coords)
dist = haversine(addresses_geo.iloc[pt].geometry.x,
addresses_geo.iloc[pt].geometry.y,
closest_point_coords[0], closest_point_coords[1])
address_dict[address] = dist
sludge_dtc[unique_name] = address_dict
# Test results on a single address
addresses_with_sludge_distance = pd.DataFrame(sludge_dtc)
print(addresses_with_sludge_distance.iloc[[1]].T)
如果我将此代码分解并尝试计算单个多边形的距离,它似乎可以正常工作。但是,当我创建DataFrame并检查地址时,它为每个多边形列出了相同的距离。
因此,内部dict-key'123 Main Street'对于外部dict中的每个多边形键都有5.25英里,而'456 South Street'对于外部dict中的每个多边形键都有6.13英里。 (弥补例子。)
我意识到我必须以设置for循环的方式做一些愚蠢的事情,但是我无法弄清楚。我已经颠倒了for语句的顺序,陷入了缩进的局面-所有结果都是相同的。
为了明确起见,我想做的是:
有什么想法我想念的吗?
答案 0 :(得分:1)
问题非常简单,您始终使用相同的address_dict
实例。
您只需要在每个键循环内重新创建它即可。
import pandas as pd
from shapely.geometry import mapping, Polygon, LinearRing, Point
import geopandas as gpd
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 3956 # Radius of earth in miles. Use 6371 for kilometers
return c * r
sludge_dtc = {k: [] for k in sf_geo.unique_name}
# Here's the key loop that isn't working correctly
for unique_name, i in zip(sf_geo.unique_name, sf_geo.index):
address_dict = {k: [] for k in addresses_geo.input_string}
for address, pt in zip(addresses_geo.input_string, addresses_geo.index):
pol_ext = LinearRing(sf_geo.iloc[i].geometry.exterior.coords)
d = pol_ext.project(addresses_geo.iloc[pt].geometry)
p = pol_ext.interpolate(d)
closest_point_coords = list(p.coords)[0]
# print(closest_point_coords)
dist = haversine(addresses_geo.iloc[pt].geometry.x,
addresses_geo.iloc[pt].geometry.y,
closest_point_coords[0], closest_point_coords[1])
address_dict[address] = dist
sludge_dtc[unique_name] = address_dict
# Test results on a single address
addresses_with_sludge_distance = pd.DataFrame(sludge_dtc)
print(addresses_with_sludge_distance.iloc[[1]].T)
另一个注意事项:
您正在创建将空列表作为值的空字典,但是直接设置值之后(替换了空列表)。如果您需要收集值列表,则应将append
个值添加到现有列表中,例如:
address_dict[address].append(dist)
和
sludge_dtc[unique_name].append(address_dict)