我是Python新手,想重建这个example。我有关于NYC出租车接送和下车的经度和纬度数据,但是,我需要将数据更改为Web Mercartor格式(这在上面的示例中找不到)。 我找到了一个函数,它可以取一对经度和纬度值,并将其更改为Web Mercartor格式,取自here,它看起来如下:
import math
def toWGS84(xLon, yLat):
# Check if coordinate out of range for Latitude/Longitude
if (abs(xLon) < 180) and (abs(yLat) > 90):
return
# Check if coordinate out of range for Web Mercator
# 20037508.3427892 is full extent of Web Mercator
if (abs(xLon) > 20037508.3427892) or (abs(yLat) > 20037508.3427892):
return
semimajorAxis = 6378137.0 # WGS84 spheriod semimajor axis
latitude = (1.5707963267948966 - (2.0 * math.atan(math.exp((-1.0 * yLat) / semimajorAxis)))) * (180/math.pi)
longitude = ((xLon / semimajorAxis) * 57.295779513082323) - ((math.floor((((xLon / semimajorAxis) * 57.295779513082323) + 180.0) / 360.0)) * 360.0)
return [longitude, latitude]
def toWebMercator(xLon, yLat):
# Check if coordinate out of range for Latitude/Longitude
if (abs(xLon) > 180) and (abs(yLat) > 90):
return
semimajorAxis = 6378137.0 # WGS84 spheriod semimajor axis
east = xLon * 0.017453292519943295
north = yLat * 0.017453292519943295
northing = 3189068.5 * math.log((1.0 + math.sin(north)) / (1.0 - math.sin(north)))
easting = semimajorAxis * east
return [easting, northing]
def main():
print(toWebMercator(-105.816001, 40.067633))
print(toWGS84(-11779383.349100526, 4875775.395628653))
if __name__ == '__main__':
main()
如何将此数据应用于我的pandas Dataframe中的每对long / lat坐标并将输出保存在相同的pandasDF中?
df.tail()
| longitude | latitude
____________|__________________|______________
11135465 | -73.986893 | 40.761093
1113546 | -73.979645 | 40.747814
11135467 | -74.001244 | 40.743172
11135468 | -73.997818 | 40.726055
...
答案 0 :(得分:1)
对于大小合适的数据集,最有帮助的是理解如何以pandas
方式执行操作。与内置的矢量化方法相比,迭代行将产生可怕的性能。
import pandas as pd
import numpy as np
df = pd.read_csv('/yellow_tripdata_2016-06.csv')
df.head(5)
VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count trip_distance pickup_longitude pickup_latitude RatecodeID store_and_fwd_flag dropoff_longitude dropoff_latitude payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount
0 2 2016-06-09 21:06:36 2016-06-09 21:13:08 2 0.79 -73.983360 40.760937 1 N -73.977463 40.753979 2 6.0 0.5 0.5 0.00 0.0 0.3 7.30
1 2 2016-06-09 21:06:36 2016-06-09 21:35:11 1 5.22 -73.981720 40.736668 1 N -73.981636 40.670242 1 22.0 0.5 0.5 4.00 0.0 0.3 27.30
2 2 2016-06-09 21:06:36 2016-06-09 21:13:10 1 1.26 -73.994316 40.751072 1 N -74.004234 40.742168 1 6.5 0.5 0.5 1.56 0.0 0.3 9.36
3 2 2016-06-09 21:06:36 2016-06-09 21:36:10 1 7.39 -73.982361 40.773891 1 N -73.929466 40.851540 1 26.0 0.5 0.5 1.00 0.0 0.3 28.30
4 2 2016-06-09 21:06:36 2016-06-09 21:23:23 1 3.10 -73.987106 40.733173 1 N -73.985909 40.766445 1 13.5 0.5 0.5 2.96 0.0 0.3 17.76
此数据集有11,135,470行,这不是“大数据”,但不小。不是编写函数并将其应用于每一行,而是通过对单个列执行部分函数来获得更多性能。我会改变这个功能:
def toWebMercator(xLon, yLat):
# Check if coordinate out of range for Latitude/Longitude
if (abs(xLon) > 180) and (abs(yLat) > 90):
return
semimajorAxis = 6378137.0 # WGS84 spheriod semimajor axis
east = xLon * 0.017453292519943295
north = yLat * 0.017453292519943295
northing = 3189068.5 * math.log((1.0 + math.sin(north)) / (1.0 - math.sin(north)))
easting = semimajorAxis * east
return [easting, northing]
进入这个:
SEMIMAJORAXIS = 6378137.0 # typed in all caps since this is a static value
df['pickup_east'] = df['pickup_longitude'] * 0.017453292519943295 # takes all pickup longitude values, multiples them, then saves as a new column named pickup_east.
df['pickup_north'] = df['pickup_latitude'] * 0.017453292519943295
# numpy functions allow you to calculate an entire column's worth of values by simply passing in the column.
df['pickup_northing'] = 3189068.5 * np.log((1.0 + np.sin(df['pickup_north'])) / (1.0 - np.sin(df['pickup_north'])))
df['pickup_easting'] = SEMIMAJORAXIS * df['pickup_east']
然后,您有pickup_easting
和pickup_northing
列,其中包含计算值。
对于我的笔记本电脑,这需要:
CPU times: user 1.01 s, sys: 286 ms, total: 1.3 s
Wall time: 763 ms
对于所有11米行。 15分钟 - &gt;秒。
我摆脱了对价值观的检查 - 你可以这样做:
df = df[(df['pickup_longitude'].abs() <= 180) & (df['pickup_latitude'].abs() <= 90)]
这使用布尔索引,它再次比循环快几个数量级。
答案 1 :(得分:0)
尝试:
df[['longitude', 'latitude']].apply(
lambda x: pd.Series(toWebMercator(*x), ['xLon', 'yLay']),
axis=1
)
答案 2 :(得分:0)
如果您希望保留一种可读的数学函数,并轻松转换当前函数,请使用eval
:
df.eval("""
northing = 3189068.5 * log((1.0 + sin(latitude * 0.017453292519943295)) / (1.0 - sin(latitude * 0.017453292519943295)))
easting = 6378137.0 * longitude * 0.017453292519943295""", inplace=False)
Out[51]:
id longitude latitude northing easting
0 11135465 -73.986893 40.761093 4.977167e+06 -8.236183e+06
1 1113546 -73.979645 40.747814 4.975215e+06 -8.235376e+06
2 11135467 -74.001244 40.743172 4.974533e+06 -8.237781e+06
3 11135468 -73.997818 40.726055 4.972018e+06 -8.237399e+06
由于您无法使用if
语句,因此您必须对语法进行一些处理,但在调用eval
之前,您可以轻松过滤出边界外数据。如果要直接分配新列,也可以使用inplace=True
。
如果您对保持数学语法感兴趣并且正在搜索全速,那么numpy答案可能会更快地执行。