我修改了我的问题,以便于理解。
原始df如下:
val ww = Window.partitionBy().orderBy($"tim")
val step1 = df.withColumn("sequence",sort_array(collect_set(col("price")).over(ww),asc=false))
.withColumn("top1price",col("sequence").getItem(0))
.withColumn("top2price",col("sequence").getItem(1))
.drop("sequence")
然后我运行代码
+---+---------+-------+----+------+---------+---------+
| id| tim| price| qty|qtyChg|top1price|top2price|
+---+---------+-------+----+------+---------+---------+
| 1|31951.509| 0.370| 1| 1| 0.370| null|
| 2|31951.515|145.380| 100| 100| 145.380| 0.370|
| 3|31951.519|149.370| 100| 100| 149.370| 145.380|
| 4|31951.520|149.370| 300| 200| 149.370| 145.380|
| 5|31951.520|144.370| 100| 100| 149.370| 145.380|
| 6|31951.520|119.370| 5| 5| 149.370| 145.380|
| 7|31951.521|149.370| 400| 100| 149.370| 145.380|
| 8|31951.522|109.370| 50| 50| 149.870| 149.370|
| 9|31951.522|144.370| 400| 300| 149.870| 149.370|
| 10|31951.522|149.870| 50| 50| 149.870| 149.370|
| 11|31951.522|149.370| 410| 10| 149.870| 149.370|
| 12|31951.524|149.370| 610| 200| 149.870| 149.370|
| 13|31951.526|135.130| 22| 22| 149.870| 149.370|
| 14|31951.527|149.370| 750| 140| 149.870| 149.370|
| 15|31951.528| 89.370| 100| 100| 149.870| 149.370|
| 16|31951.528|139.370| 100| 100| 149.870| 149.370|
| 17|31951.528|145.870| 50| 50| 149.870| 149.370|
| 18|31951.531|144.370| 410| 10| 149.870| 149.370|
| 19|31951.531|149.370| 769| 19| 149.870| 149.370|
| 20|31951.538|144.880| 200| 200| 149.870| 149.370|
| 21|31951.538|149.370| 869| 100| 149.870| 149.370|
| 22|31951.541|139.370| 221| 121| 149.870| 149.370|
| 23|31951.542|144.370| 510| 100| 149.870| 149.370|
| 24|31951.542|139.370| 236| 15| 149.870| 149.370|
| 25|31951.542|149.370|1199| 330| 149.870| 149.370|
| 26|31951.543|139.370| 381| 145| 149.870| 149.370|
| 27|31951.543|143.820| 100| 100| 149.870| 149.370|
| 28|31951.543|146.250| 50| 50| 149.870| 149.370|
| 29|31951.544|140.470| 10| 10| 150.000| 149.870|
| 30|31951.544|137.870| 300| 300| 150.000| 149.870|
| 31|31951.544|150.000| 50| 50| 150.000| 149.870|
| 32|31951.544|149.370|1266| 67| 150.000| 149.870|
| 33|31951.545|140.000| 25| 25| 150.000| 149.870|
| 34|31951.545|150.000| 53| 3| 150.000| 149.870|
| 35|31951.545|148.310| 8| 8| 150.000| 149.870|
| 36|31951.547|149.000| 20| 20| 150.000| 149.870|
| 37|31951.549|150.110| 75| 75| 150.110| 150.000|
| 38|31951.549|143.820| 102| 2| 150.110| 150.000|
+---+---------+-------+----+------+---------+---------+
新数据框如下所示:
{{1}}
我希望获得两个新列top1priceQty,top2priceQty,它们存储了top1price和top2price的最新更新数量。
例如,在第6行中,top1price = 149.370,基于此值,我想获取其对应的数量为400(而不是100或300)。在第33行中,当top1price = 150.00000000时,我要获取其对应的数量,即来自第32行的53,而不是来自第28行的50。相同的规则适用于top2price
谢谢大家!
答案 0 :(得分:1)
您自己非常接近答案。与其收集仅一列的集合,不如收集“ LMTPRICE”的数组及其对应的“ qty”。然后将getItem(0).getItem(0)用于top1price,将getItem(0).getItem(1)用于top1priceQty。为了在INTEREST_TIME之前保持顺序以获取正确的数量,请在“ LMTPRICE”之后和“ qty”之前也使用INTEREST_TIME。
df.withColumn("sequence",sort_array(collect_set(array("LMTPRICE","INTEREST_TIME","qty")).over(ww),asc=false)).withColumn("top1price",col("sequence").getItem(0).getItem(0)).withColumn("top1priceQty",col("sequence").getItem(0).getItem(2).cast("int")).drop("sequence").show(false)
+-----+-------------+--------+---+------+---------+------------+
|index|INTEREST_TIME|LMTPRICE|qty|qtyChg|top1price|top1priceQty|
+-----+-------------+--------+---+------+---------+------------+
|0 |31951.509 |0.37 |1 |1 |0.37 |1 |
|1 |31951.515 |145.38 |100|100 |145.38 |100 |
|2 |31951.519 |149.37 |100|100 |149.37 |100 |
|3 |31951.52 |119.37 |5 |5 |149.37 |300 |
|4 |31951.52 |144.37 |100|100 |149.37 |300 |
|5 |31951.52 |149.37 |300|200 |149.37 |300 |
|6 |31951.521 |149.37 |400|100 |149.37 |400 |
|7 |31951.522 |149.87 |50 |50 |149.87 |50 |
|8 |31951.522 |149.37 |410|10 |149.87 |50 |
|9 |31951.522 |109.37 |50 |50 |149.87 |50 |
|10 |31951.522 |144.37 |400|300 |149.87 |50 |
|11 |31951.524 |149.87 |610|200 |149.87 |610 |
|12 |31951.526 |135.13 |22 |22 |149.87 |610 |
|13 |31951.527 |149.37 |750|140 |149.87 |610 |
|14 |31951.528 |139.37 |100|100 |149.87 |610 |
|15 |31951.528 |145.87 |50 |50 |149.87 |610 |
|16 |31951.528 |89.37 |100|100 |149.87 |610 |
|17 |31951.531 |144.37 |410|10 |149.87 |610 |
|18 |31951.531 |149.37 |769|19 |149.87 |610 |
|19 |31951.538 |149.37 |869|100 |149.87 |610 |
+-----+-------------+--------+---+------+---------+------------+