此代码执行Haversine距离计算,是较大项目的一部分。
Java实现似乎比Python快60倍。它们的实现几乎相同。
此性能在同一台机器和操作系统上。我尝试了各种组合:
但执行速度的差异是一致的。
from math import radians, sin, cos, sqrt, asin
import time
def haversine(lat1, lon1, lat2, lon2):
R = 6372.8 # Earth radius in kilometers
dLat = radians(lat2 - lat1)
dLon = radians(lon2 - lon1)
lat1 = radians(lat1)
lat2 = radians(lat2)
a = sin(dLat/2)**2 + cos(lat1)*cos(lat2)*sin(dLon/2)**2
c = 2*asin(sqrt(a))
return R * c
start_time = time.time()
#START: Performance critical part
for x in range(1, 1*1000*1000*10):
haversine(36.12, -86.67, 33.94, -118.40)
#END: Performance critical part
elapsed_time = time.time() - start_time
print("Python elapsed time = " + str(elapsed_time)+" seconds")
import java.util.Date;
public class HaversineTest {
public static final double R = 6372.8; // In kilometers
public static double haversine(double lat1, double lon1, double lat2, double lon2) {
double dLat = Math.toRadians(lat2 - lat1);
double dLon = Math.toRadians(lon2 - lon1);
lat1 = Math.toRadians(lat1);
lat2 = Math.toRadians(lat2);
double a = Math.pow(Math.sin(dLat / 2),2) + Math.pow(Math.sin(dLon / 2),2) * Math.cos(lat1) * Math.cos(lat2);
double c = 2 * Math.asin(Math.sqrt(a));
return R * c;
}
public static void main(String[] args) {
int loopCount = 1*1000*1000*10;
long start_time = new Date().getTime();
for(int i=0;i<loopCount;i++){
haversine(36.12, -86.67, 33.94, -118.40);
}
long end_time = new Date().getTime();
long elapsed_time=end_time-start_time;
System.out.println("Java elapsed time = "+elapsed_time+" ms");
}
}
版本:
Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] on win32
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
输出:
Java耗时= 229 ms
Python耗时= 15.028019428253174秒
我可以提高Python性能吗?
答案 0 :(得分:4)
有两个主要瓶颈:
haversine
函数和math
函数)。如果使用numpy并对数组进行操作,则只需调用一次函数,因为循环在数组上进行矢量化 - 结果是代码在我的计算机上运行速度提高了30倍(代码取自{ {3}}但已更改,因此它适用于numpy数组):
import numpy as np
R = 6372.8 # Earth radius in kilometers
def haversine(lat1, lon1, lat2, lon2):
dLat = np.radians(lat2 - lat1)
dLon = np.radians(lon2 - lon1)
lat1 = np.radians(lat1)
lat2 = np.radians(lat2)
sinLat = np.sin(dLat/2)
sinLon = np.sin(dLon/2)
a = sinLat * sinLat + np.cos(lat1) * np.cos(lat2) * sinLon * sinLon
c = 2 * np.arcsin(np.sqrt(a))
return R * c
haversine(np.ones(10000000) * 36.12,
np.ones(10000000) * -86.67,
np.ones(10000000) * 33.94,
np.ones(10000000) * -118.40)
但是这会产生很多巨大的临时数组,Prunes answer你可以避免它们并仍然使用数学函数:
from math import radians, sin, cos, sqrt, asin
from numba import njit
@njit
def haversine(lat1, lon1, lat2, lon2):
R = 6372.8 # Earth radius in kilometers
dLat = radians(lat2 - lat1)
dLon = radians(lon2 - lon1)
lat1 = radians(lat1)
lat2 = radians(lat2)
a = sin(dLat/2)**2 + cos(lat1)*cos(lat2)*sin(dLon/2)**2
c = 2*asin(sqrt(a))
return R * c
@njit
def haversine_loop():
res = np.empty(10*1000*1000, dtype=np.float_)
for x in range(0, 10*1000*1000):
res[x] = haversine(36.12, -86.67, 33.94, -118.40)
return res
haversine_loop()
实际上,这比numpy代码提高了50倍(最终提高了150倍)。但是你需要检查这种方法在你的情况下是否可行(numba不是轻量级的依赖!)。
答案 1 :(得分:1)
您希望它有多快?一个装饰者,你可以获得6倍的加速。
如果你有很多这样的事情,那么很有可能是figxpr或其他numpy方法会更快。所以,这取决于你需要它。
from numba import jit
@jit(nopython=true)
def havershine ....