使用Spark 2 Scala使用以公里为单位的纬度经度坐标来计算距离

时间:2019-08-12 14:51:16

标签: scala apache-spark haversine

我正在尝试使用Scala 2.11.8中Spark 2.3中的hasrsine公式来计算两个地理坐标之间的距离(以公里为单位)。

我想计算两个动作之间的用户距离:

我有经度和纬度,其想法是获得以KM为单位的距离。

def uri = new URI(System.env.DATABASE_URL)
dataSources {
        dataSource {
            dbCreate = "none"
            username = uri.userInfo ? uri.userInfo.split(":")[0] : ""
            password = uri.userInfo ? uri.userInfo.split(":")[1] : ""
            driverClassName = "oracle.jdbc.driver.OracleDriver"
            dialect = "org.hibernate.dialect.Oracle10gDialect"
            url = "jdbc:oracle:thin:@" + uri.host + uri.path
            properties {
                jmxEnabled = true
                initialSize = 5
                maxActive = 50
                minIdle = 5
                maxIdle = 25
                maxWait = 10000
                maxAge = 600000
                timeBetweenEvictionRunsMillis = 5000
                minEvictableIdleTimeMillis = 60000
                validationQuery = 'SELECT 1'
                validationQueryTimeout = 3
                validationInterval = 15000
                testOnBorrow = true
                testWhileIdle = true
                testOnReturn = false
                jdbcInterceptors = 'ConnectionState'
                defaultTransactionIsolation = 2 //# TRANSACTION_READ_COMMITTED
            }
        }
    }

使用Python DataFrame对我来说效果很好,但是我在Scala Spark中苦苦挣扎!

我使用了以下代码,但似乎无法正常工作。

+-----------+------------------+------------------+-----------------+
|       user| distance         |Longitude_Centroid|Latitude_Centroid|    
+-----------+------------------+------------------+-----------------+    
|-2525      |              null| 7.038245640847997|39.48919886182785|    
|-2147      |12818.567585128396| 7.038245640847997|39.48919886182785|
|-2147      |12818.567585128396| 7.038245640847997|39.48919886182785|    
|-2525      |12862.278795753988| 7.050538333095536|39.49362379246508|

1 个答案:

答案 0 :(得分:2)

找到解决方案

df4.withColumn("lat_lag", lag($"Latitude_Centroid",     1).over(window)).withColumn("lng_lag", lag($"Longitude_Centroid",  1).over(window)).select("imei","lat_lag","lng_lag","date_from","Longitude_Centroid","Latitude_Centroid")  .withColumn("a", pow(sin(toRadians($"Latitude_Centroid" - $"lat_lag") / 2), 2) + cos(toRadians($"lat_lag")) * cos(toRadians($"Latitude_Centroid")) * pow(sin(toRadians($"Longitude_Centroid" - $"lng_lag") / 2), 2))  .withColumn("distance", atan2(sqrt($"a"), sqrt(-$"a" + 1)) * 2 * 6371)   .select("imei","lat_lag","lng_lag","date_from","Longitude_Centroid","Latitude_Centroid","distance")  .show()