对于每个节点(纬度,经度),要查看100m距离内发生了多少租金。
我有两个数据帧,一个叫做“ nodes_df”:
id geocode title lng lat
0 1 POINT(127.036077 37.490958) place1 19.036077 67.490958
1 2 POINT(127.03103 37.491231) place2 167.031030 37.491231
2 3 POINT(127.030428 37.4925) place3 147.630428 27.492500
3 4 POINT(127.029558 37.494329) place4 117.029558 17.494329
4 5 POINT(127.029326 37.495018) place5 147.529326 57.495018
和另一个叫做“ rents_df”的
geocode lng lat
0 POINT(127.03580515559 37.493864399152) 127.035805 37.493864
1 POINT(127.03580515559 37.493864399152) 127.035805 37.493864
2 POINT(127.03580515559 37.493864399152) 127.035805 37.493864
3 POINT(127.03580515559 37.493864399152) 127.035805 37.493864
4 POINT(127.03580515559 37.493864399152) 127.035805 37.493864
我要做的是针对nodes_df中连续的每对(纬度,经度)对,我想用它与rents_df中的所有(纬度,经度)对进行比较,找出在100m距离内有多少对。 / p>
这是我的代码:
def count_per_node(node_geocode, title):
#within 100m boundary of node
# compare node with all rents
within_df = rents_df.loc[rents_df[['lat', 'lng']].apply(lambda x: haversine(x, node_geocode), axis=1) <= 0.1]
return len(within_df)
# for each geocode of node, compare it
data = {}
for node in nodes_df["title"]:
lat_lng_df = nodes_df.loc[nodes_df["title"] == node][["lat", "lng"]]
node_geocode = (lat_lng_df.values[0][0], lat_lng_df.values[0][1])
data[node] = count_per_node(node_geocode, node)
print(data)
这可以完成工作,但是我的数据量很大,一个小时左右会崩溃。有帮助吗?
**所需的输出:**
title number_of_rents_within_range
0 place1 355
1 place2 1000
2 place3 3043
3 place4 3094
4 place5 230823
以此类推...
当前正在运行的代码如下:
rents_geocode = list(zip(rents_df.lat, rents_df.lng))
nodes_geocode = list(zip(nodes_df.lat, nodes_df.lng))
counts = []
for n in nodes_geocode:
count = 0
for r in rents_geocode:
if haversine(n , r) <= 0.1:
count += 1
counts.append(count)
但时间复杂度为O(n ^ 2)...