我正在尝试使用Scala 2.11.8中Spark 2.3中的hasrsine公式来计算两个地理坐标之间的距离(以公里为单位)。
我想计算两个动作之间的用户距离:
我有经度和纬度,其想法是获得以KM为单位的距离。
def uri = new URI(System.env.DATABASE_URL)
dataSources {
dataSource {
dbCreate = "none"
username = uri.userInfo ? uri.userInfo.split(":")[0] : ""
password = uri.userInfo ? uri.userInfo.split(":")[1] : ""
driverClassName = "oracle.jdbc.driver.OracleDriver"
dialect = "org.hibernate.dialect.Oracle10gDialect"
url = "jdbc:oracle:thin:@" + uri.host + uri.path
properties {
jmxEnabled = true
initialSize = 5
maxActive = 50
minIdle = 5
maxIdle = 25
maxWait = 10000
maxAge = 600000
timeBetweenEvictionRunsMillis = 5000
minEvictableIdleTimeMillis = 60000
validationQuery = 'SELECT 1'
validationQueryTimeout = 3
validationInterval = 15000
testOnBorrow = true
testWhileIdle = true
testOnReturn = false
jdbcInterceptors = 'ConnectionState'
defaultTransactionIsolation = 2 //# TRANSACTION_READ_COMMITTED
}
}
}
使用Python DataFrame对我来说效果很好,但是我在Scala Spark中苦苦挣扎!
我使用了以下代码,但似乎无法正常工作。
+-----------+------------------+------------------+-----------------+
| user| distance |Longitude_Centroid|Latitude_Centroid|
+-----------+------------------+------------------+-----------------+
|-2525 | null| 7.038245640847997|39.48919886182785|
|-2147 |12818.567585128396| 7.038245640847997|39.48919886182785|
|-2147 |12818.567585128396| 7.038245640847997|39.48919886182785|
|-2525 |12862.278795753988| 7.050538333095536|39.49362379246508|
答案 0 :(得分:2)
找到解决方案
df4.withColumn("lat_lag", lag($"Latitude_Centroid", 1).over(window)).withColumn("lng_lag", lag($"Longitude_Centroid", 1).over(window)).select("imei","lat_lag","lng_lag","date_from","Longitude_Centroid","Latitude_Centroid") .withColumn("a", pow(sin(toRadians($"Latitude_Centroid" - $"lat_lag") / 2), 2) + cos(toRadians($"lat_lag")) * cos(toRadians($"Latitude_Centroid")) * pow(sin(toRadians($"Longitude_Centroid" - $"lng_lag") / 2), 2)) .withColumn("distance", atan2(sqrt($"a"), sqrt(-$"a" + 1)) * 2 * 6371) .select("imei","lat_lag","lng_lag","date_from","Longitude_Centroid","Latitude_Centroid","distance") .show()