根据时间戳在Spark Scala中的条件拆分顺序df

时间:2018-11-19 18:38:35

标签: scala apache-spark apache-spark-sql

我有一个基于时间戳的连续df,当距离值大于1000时我想进行分割。

df看起来像这样:

+-----------------+-------------------+---+ |timestamp |distance |id | +-----------------+-------------------+---+ |1541712752000 |1.1990470282994594 |123| |1541713551000 |1.5804709872862326 |123| |1541714462000 |0.0 |123| |1541715475000 |0.53107795768697 |123| |1541716383000 |0.53107795768697 |123| |1541716792000 |0.24740321078091282|123| |1541717695000 |1542.00 |123| |1541717801000 |2.7767418047706816 |123| |1541718779000 |13.058715260118664 |123| |1541719672000 |22.64146251404579 |123| |1541720581000 |23.861007122654314 |123| |1541721502000 |16.327504368653443 |123| |1541722572000 |26.084599108380274 |123| |1541723500000 |20.630034360787512 |123| |1541724219000 |1893.00 |123| |1541725264000 |23.16455204686255 |123| |1541726037000 |15.911555304774817 |123| |1541726950000 |20.057274313740784 |123| |1541727884000 |12.967418789242549 |123| |1541728085000 |2.720850595301784 |123| +-----------------+-------------------+---+

基于距离大于1000的连续分割df,我希望具有三个如下所示的df:

+-----------------+-------------------+---+ |timestamp |distance |id | +-----------------+-------------------+---+ |1541712752000 |1.1990470282994594 |123| |1541713551000 |1.5804709872862326 |123| |1541714462000 |0.0 |123| |1541715475000 |0.53107795768697 |123| |1541716383000 |0.53107795768697 |123| |1541716792000 |0.24740321078091282|123| +-----------------+-------------------+---+

+-----------------+-------------------+---+ |timestamp |distance |id | +-----------------+-------------------+---+ |1541717695000 |1542.00 |123| |1541717801000 |2.7767418047706816 |123| |1541718779000 |13.058715260118664 |123| |1541719672000 |22.64146251404579 |123| |1541720581000 |23.861007122654314 |123| |1541721502000 |16.327504368653443 |123| |1541722572000 |26.084599108380274 |123| |1541723500000 |20.630034360787512 |123| +-----------------+-------------------+---+

+-----------------+-------------------+---+ |timestamp |distance |id | +-----------------+-------------------+---+ |1541724219000 |1893.00 |123| |1541725264000 |23.16455204686255 |123| |1541726037000 |15.911555304774817 |123| |1541726950000 |20.057274313740784 |123| |1541727884000 |12.967418789242549 |123| |1541728085000 |2.720850595301784 |123| +-----------------+-------------------+---+

我正在使用Spark 2.0.0

谢谢

0 个答案:

没有答案