获得最早和最早的唱片 - SPARK

时间:2017-03-19 16:41:47

标签: apache-spark apache-spark-sql spark-dataframe

我正在使用twitter的API。我的任务是检索最旧和最新的 为用户记录。我有结果;

+--------+--------------------+
| user.id|          created_at|
+--------+--------------------+
|28688324|Fri Mar 01 05:33:...|
|28688324|Sat Mar 02 04:21:...|
|28688324|Sun Mar 03 02:10:...|
|28688324|Sun Mar 03 02:11:...|
|28688324|Sun Mar 03 02:11:...|
|28688324|Sun Mar 03 02:12:...|
|28688324|Sun Mar 03 02:12:...|
|28688324|Sun Mar 03 02:13:...|
|28688324|Sun Mar 03 02:14:...|
|28688324|Sun Mar 03 02:14:...|
|28688324|Sun Mar 03 02:14:...|
|28688324|Sun Mar 03 02:15:...|
|28688324|Sun Mar 03 02:15:...|
|28688324|Sun Mar 03 02:15:...|
|28688324|Sun Mar 03 02:16:...|
|28688324|Sun Mar 03 02:17:...|
|28688324|Sun Mar 03 02:17:...|
|28688324|Sun Mar 03 02:17:...|
|28688324|Sun Mar 03 02:18:...|
|28688324|Sun Mar 03 02:19:...|
+--------+--------------------+

代码;

dataset.filter("user.id = '28688324'")\
.select(dataset.user.id, dataset.created_at)\
.show()

我能够使用; Spark SQL,Spark DataFrame和Spark RDD。我怎么能回复这两条记录?

编辑: 我正在处理日期,而不是数字。而且我还需要获得2行,如;

+--------+--------------------+ 
| user.id| created_at|        |
+--------+--------------------+ 
|28688324|Fri Mar 01 05:33:...| 
|28688324|Sat Mar 02 04:21:...| 
+--------+--------------------+ 

代表最古老和最新的。

0 个答案:

没有答案