如何从组中选择第一条记录?

时间:2017-06-05 23:54:51

标签: java apache-spark apache-spark-sql

我有一个列表/数组记录,我使用explode从列表中提取数据。我想在Java中使用Spark SQL从爆炸结果中选择第一条记录。

Dataset<Row> ds= ds.select(
  json.col("*"), 
  explode(json.col("records.record.newrecord")).as("newrecord"));
ds= ds.select(ds.col("EVENT_SEQ"), ds.col("newrecord").apply("event").as("EVENTTYPE")); 

当前数据:

|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

要求:

|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

我已经看到了为此目的建议Cloumn.apply的文档,但我还没有找到足够的帮助让我开始。

1 个答案:

答案 0 :(得分:1)

具有val ds = Seq( ("5a694d77-bc65-4bf...", 0), ("5a694d77-bc65-4bf...", 0) ).toDF("EVENT_SEQ", "EVENTTYPE") scala> ds.show +--------------------+---------+ | EVENT_SEQ|EVENTTYPE| +--------------------+---------+ |5a694d77-bc65-4bf...| 0| |5a694d77-bc65-4bf...| 0| +--------------------+---------+ scala> ds.groupBy("EVENT_SEQ").agg(first("EVENTTYPE")).show +--------------------+-----------------------+ | EVENT_SEQ|first(EVENTTYPE, false)| +--------------------+-----------------------+ |5a694d77-bc65-4bf...| 0| +--------------------+-----------------------+ 功能的$BasketLastDate = date('Y-m-d H:i:s',strtotime('+7 minutes')); $BasketLastDate = strtotime($BasketLastDate); $sepetONAY = 0; $sg = "SELECT * FROM baskets WHERE approve=1 and checkout=0"; foreach($db->query($sg) as $basketDetails){ $BasketDate = $basketDetails['basket_date']; $BasketDate = strtotime($BasketDate); // If the basket older than 7 minutes if ($BasketLastDate < $BasketDate){ // code.... } 运营商当然。

ini_set("user_agent","Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0");