我正在寻找一种获取新列的方法,该列会报告条件下的最小距离(公里)。
通过一个示例将更加清楚:
Ser_Numb LAT LONG VALUE MIN
1 74.166061 30.512811 1
2 72.249672 33.427724 1
3 67.499828 37.937264 0
4 84.253715 69.328767 1
5 72.104828 33.823462 0
6 63.989462 51.918173 0
7 80.209112 33.530778 0
8 68.954132 35.981256 1
9 83.378214 40.619652 1
10 68.778571 6.607066 0
因此,当value=0
时,我必须找到最接近的其他城市(纬度/经度)以计算与呈现VALUE=1
的该城市的距离。
有了这个stack,我们可以得到公式,但是我怎样才能使它适应最小距离呢?
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
# Radius of earth in kilometers is 6371
km = 6371* c
return km
编辑 这是我尝试的方法:
df['dist_VALUE']=0
for i in range(len(df[df['VALUE']<1])):
for j in range(len(df[df['VALUE']>0])):
(df[df['VALUE']<1].reset_index(drop=True).loc[i,'dist_VALUE'] =
min(haversine(df[df['VALUE']<1].reset_index(drop=True).loc[I,'LONG'],
df[df['VALUE']<1].reset_index(drop=True).loc[i,'LAT'],
df[df['VALUE']>0].reset_index(drop=True).loc[j,'LONG'],
df[df['VALUE']>0].reset_index(drop=True).loc[j,'LAT'])))
VALUE
是整数,而LAT
或LONG
是浮点型。
答案 0 :(得分:1)
也许这可以帮助您:
import pandas as pd
df = pd.DataFrame(
data=[
[74.166061, 30.512811, 1],
[72.249672, 33.427724, 1],
[67.499828, 37.937264, 0],
[84.253715, 69.328767, 1],
[72.104828, 33.823462, 0],
[63.989462, 51.918173, 0],
[80.209112, 33.530778, 0],
[68.954132, 35.981256, 1],
[83.378214, 40.619652, 1],
[68.778571, 6.607066, 0],
],
columns=['lat', 'long', 'val'])
df['min'] = 0
print(df)
# print(df.shape)
# print(df.index)
# print(df.columns)
destination_cities = [
{
'i': i,
'lat': row['lat'],
'long': row['long'],
}
for i, row in df.iterrows()
if row['val'] == 1]
print('destination_cities')
print(destination_cities)
for i in df.index:
row = df.iloc[i, :]
# print(type(row))
# print(row)
if row['val'] == 0:
target_distances = [
{
'destination_i': i,
'distance': haversine(
lon1=row['long'],
lat1=row['lat'],
lon2=destination['long'],
lat2=destination['lat']),
}
for destination in destination_cities]
elem = min(target_distances, key=lambda x: x['distance'])
row = df.loc[i, 'min'] = elem['distance']
print(df)
另一种方法可能是预先计算每个城市的最短距离,并使用df.apply()
来分配值;也许这对您来说会快一点:
df = pd.DataFrame(
data=[
[ 1, 74.166061, 30.512811, 1],
[ 2, 72.249672, 33.427724, 1],
[ 3, 67.499828, 37.937264, 0],
[ 4, 84.253715, 69.328767, 1],
[ 5, 72.104828, 33.823462, 0],
[ 6, 63.989462, 51.918173, 0],
[ 7, 80.209112, 33.530778, 0],
[ 8, 68.954132, 35.981256, 1],
[ 9, 83.378214, 40.619652, 1],
[10, 68.778571, 6.607066, 0],
],
columns=['i', 'lat', 'long', 'val'])
# precompute closest distance for each city with val=0 to all cities with val=1
distances = {}
for _, row_orig in df.iterrows():
if row_orig['val'] == 0:
distances[row_orig['i']] = min(
haversine(
lon1=row_orig['long'],
lat1=row_orig['lat'],
lon2=row_dest['long'],
lat2=row_dest['lat'])
for _, row_dest in df.iterrows()
if row_dest['val'] == 1])
df['min'] = df.apply(lambda row: distances.get(row['i'], 0), axis=1)
print(df)