我正在使用twitter的API。我的任务是检索最旧和最新的 为用户记录。我有结果;
+--------+--------------------+
| user.id| created_at|
+--------+--------------------+
|28688324|Fri Mar 01 05:33:...|
|28688324|Sat Mar 02 04:21:...|
|28688324|Sun Mar 03 02:10:...|
|28688324|Sun Mar 03 02:11:...|
|28688324|Sun Mar 03 02:11:...|
|28688324|Sun Mar 03 02:12:...|
|28688324|Sun Mar 03 02:12:...|
|28688324|Sun Mar 03 02:13:...|
|28688324|Sun Mar 03 02:14:...|
|28688324|Sun Mar 03 02:14:...|
|28688324|Sun Mar 03 02:14:...|
|28688324|Sun Mar 03 02:15:...|
|28688324|Sun Mar 03 02:15:...|
|28688324|Sun Mar 03 02:15:...|
|28688324|Sun Mar 03 02:16:...|
|28688324|Sun Mar 03 02:17:...|
|28688324|Sun Mar 03 02:17:...|
|28688324|Sun Mar 03 02:17:...|
|28688324|Sun Mar 03 02:18:...|
|28688324|Sun Mar 03 02:19:...|
+--------+--------------------+
代码;
dataset.filter("user.id = '28688324'")\
.select(dataset.user.id, dataset.created_at)\
.show()
我能够使用; Spark SQL,Spark DataFrame和Spark RDD。我怎么能回复这两条记录?
编辑: 我正在处理日期,而不是数字。而且我还需要获得2行,如;
+--------+--------------------+
| user.id| created_at| |
+--------+--------------------+
|28688324|Fri Mar 01 05:33:...|
|28688324|Sat Mar 02 04:21:...|
+--------+--------------------+
代表最古老和最新的。