我是scala的新手,我不知道该怎么问这种问题(技术性字词...)。我有一个数据框:
id VehicleID Longitude Latitude Date Distance
1 12311 55.55431 25.45631 01/02/2020 20
2 12311 55.55432 25.45634 01/02/2020 80
3 12311 55.55433 25.45637 02/02/2020 10
4 12311 55.55431 25.45621 02/02/2020 50
5 12309 55.55427 25.45627 01/02/2020 30
6 12309 55.55436 25.45655 02/02/2020 20
7 12412 55.55441 25.45657 01/02/2020 14
8 12412 55.55442 25.45656 02/02/2020 60
我想计算每个块的平均值和标准偏差 例如
VehicleID Longitude Latitude Date Distance Mean
12311 55.55431 25.45631 01/02/2020 20 -
12311 55.55432 25.45634 01/02/2020 80 -
VehicleID Longitude Latitude Date Distance Mean
12311 55.55433 25.45637 02/02/2020 10
12311 55.55431 25.45621 02/02/2020 50
VehicleID Longitude Latitude Date Distance Mean
12309 55.55427 25.45627 01/02/2020 30 -
VehicleID Longitude Latitude Date Distance Mean
12309 55.55436 25.45655 02/02/2020 20 -
与标准偏差相同
我尝试过,但是对我不起作用
val w = Window.partitionBy("vehicle_id", "Date").orderBy("id")
val m = dataframe_final.withColumn("mean",col("Distance").over(w).cast("double")).as[Double].rdd.mean()
我该怎么做?
谢谢
答案 0 :(得分:0)
您可以仅使用groupBy
来完成此操作:
val groupedMS = df.groupBy("VehicleID","Date")
.agg(("Distance", "mean"),("Distance", "stddev"))
df.join(groupedMS, Seq("VehicleID","Date"))
给你:
+---------+----------+---+---------+--------+--------+-------------+------------------+
|VehicleID| Date| id|Longitude|Latitude|Distance|avg(Distance)| stddev(Distance)|
+---------+----------+---+---------+--------+--------+-------------+------------------+
| 12311|01/02/2020| 1| 55.55431|25.45631| 20| 50.0| 42.42640687119285|
| 12311|01/02/2020| 2| 55.55432|25.45634| 80| 50.0| 42.42640687119285|
| 12311|02/02/2020| 3| 55.55433|25.45637| 10| 30.0|28.284271247461902|
| 12311|02/02/2020| 4| 55.55431|25.45621| 50| 30.0|28.284271247461902|
| 12309|01/02/2020| 5| 55.55427|25.45627| 30| 30.0| NaN|
| 12309|02/02/2020| 6| 55.55436|25.45655| 20| 20.0| NaN|
| 12412|01/02/2020| 7| 55.55441|25.45657| 14| 14.0| NaN|
| 12412|02/02/2020| 8| 55.55442|25.45656| 60| 60.0| NaN|
+---------+----------+---+---------+--------+--------+-------------+------------------+