嗨,我是一个全新的引发scala。我需要一个想法或任何样本解决方案。我有这样的数据
tagid,timestamp,listner,orgid,suborgid,rssi
[4,1496745915,718,4,3,0.30]
[2,1496745915,3878,4,3,0.20]
[4,1496745918,362,4,3,0.60]
[4,1496745913,362,4,3,0.60]
[2,1496745918,362,4,3,0.10]
[3,1496745912,718,4,3,0.05]
[2,1496745918,718,4,3,0.30]
[4,1496745911,1901,4,3,0.60]
[4,1496745912,718,4,3,0.60]
[2,1496745915,362,4,3,0.30]
[2,1496745912,3878,4,3,0.20]
[2,1496745915,1901,4,3,0.30]
[2,1496745910,1901,4,3,0.30]
我想找到每个标签和每个列表器最后10秒的时间戳数据。然后对于10秒数据,我需要找到rssi值的平均值。就像这样。
2,1496745918,718,4,3,0.60
2,1496745917,718,4,3,1.30
2,1496745916,718,4,1,2.20
2,1496745914,718,1,2,3.10
2,1496745911,718,1,2,6.10
4,1496745910,1901,1,2,0.30
4,1496745908,1901,1,2,1.30
..........................
..........................
像这样我需要找到它。任何解决方案或建议表示赞赏。 注意:我正在使用spark scala。
我试过了spark sql查询。但是没有正常工作。
val filteravg = avg.registerTempTable("avg")
val avgfinal = sqlContext.sql("SELECT tagid,timestamp,listner FROM (SELECT tagid,timestamp,listner,dense_rank() OVER (PARTITION BY _c6 ORDER BY _c5 ASC) as rank FROM avg) tmp WHERE rank <= 10")
avgfinal.collect.foreach(println)
我也正在尝试通过阵列。任何帮助将不胜感激。
答案 0 :(得分:3)
如果您已将数据框设为
+-----+----------+-------+-----+--------+----+
|tagid|timestamp |listner|orgid|suborgid|rssi|
+-----+----------+-------+-----+--------+----+
|4 |1496745915|718 |4 |3 |0.30|
|2 |1496745915|3878 |4 |3 |0.20|
|4 |1496745918|362 |4 |3 |0.60|
|4 |1496745913|362 |4 |3 |0.60|
|2 |1496745918|362 |4 |3 |0.10|
|3 |1496745912|718 |4 |3 |0.05|
|2 |1496745918|718 |4 |3 |0.30|
|4 |1496745911|1901 |4 |3 |0.60|
|4 |1496745912|718 |4 |3 |0.60|
|2 |1496745915|362 |4 |3 |0.30|
|2 |1496745912|3878 |4 |3 |0.20|
|2 |1496745915|1901 |4 |3 |0.30|
|2 |1496745910|1901 |4 |3 |0.30|
+-----+----------+-------+-----+--------+----+
执行以下操作应该适合您
df.withColumn("firstValue", first("timestamp") over Window.orderBy($"timestamp".desc).partitionBy("tagid"))
.filter($"firstValue".cast("long")-$"timestamp".cast("long") < 10)
.withColumn("average", avg("rssi") over Window.partitionBy("tagid"))
.drop("firstValue")
.show(false)
你应该输出
+-----+----------+-------+-----+--------+----+-------------------+
|tagid|timestamp |listner|orgid|suborgid|rssi|average |
+-----+----------+-------+-----+--------+----+-------------------+
|3 |1496745912|718 |4 |3 |0.05|0.05 |
|4 |1496745918|362 |4 |3 |0.60|0.54 |
|4 |1496745915|718 |4 |3 |0.30|0.54 |
|4 |1496745913|362 |4 |3 |0.60|0.54 |
|4 |1496745912|718 |4 |3 |0.60|0.54 |
|4 |1496745911|1901 |4 |3 |0.60|0.54 |
|2 |1496745918|362 |4 |3 |0.10|0.24285714285714288|
|2 |1496745918|718 |4 |3 |0.30|0.24285714285714288|
|2 |1496745915|3878 |4 |3 |0.20|0.24285714285714288|
|2 |1496745915|362 |4 |3 |0.30|0.24285714285714288|
|2 |1496745915|1901 |4 |3 |0.30|0.24285714285714288|
|2 |1496745912|3878 |4 |3 |0.20|0.24285714285714288|
|2 |1496745910|1901 |4 |3 |0.30|0.24285714285714288|
+-----+----------+-------+-----+--------+----+-------------------+