我有一个1000多行架构的Dataframe(df1):
+--------+-------------+-------------------+
| id| pro_id| datetime|
+--------+-------------+-------------------+
|11304569| 8195360|2015-01-23 15:21:51|
|11334963| 8060212|2015-01-28 22:49:17|
+--------+-------------+-------------------+
申请后:
val df2 = df1.limit(10)
println(" no of df2 : "+ df2.count())
df2.show()
给出结果:
no of df2 : 10 +--------+-------------+-------------------+ | id| pro_id| datetime| +--------+-------------+-------------------+ |11304569| 8195360|2015-01-23 15:21:51| |11334963| 8060212|2015-01-28 22:49:17| |11334963| 8060212|2015-01-28 22:49:17| |11334963| 8060212|2015-01-28 23:20:43| |11304569| 8143638|2015-02-03 14:34:48| |11336154| 8060212|2015-02-03 19:25:24| |11304569| 8173052|2015-02-05 08:15:12| |11398902| 8173052|2015-02-05 08:18:50| |11349129| 8097653|2015-02-05 08:29:33| |11349129| 8027845|2015-02-05 08:29:33| +--------+-------------+-------------------+
然后应用过滤函数
val v = df2.filter($"datetime" >= "2015-02-03 00:00:00")
println(" no of v : "+ v.count())
v.show()
这应该只给我最后6行,而是 它给了:
no of v : 10 +--------+-------------+-------------------+ | id| pro_id| datetime| +--------+-------------+-------------------+ |11304569| 8143638|2015-02-03 14:34:48| |11336154| 8060212|2015-02-03 19:25:24| |11304569| 8173052|2015-02-05 08:15:12| |11398902| 8173052|2015-02-05 08:18:50| |11349129| 8097653|2015-02-05 08:29:33| |11349129| 8027845|2015-02-05 08:29:33| |11349129| 8105806|2015-02-05 08:29:33| |11349187| 8197725|2015-02-05 09:00:32| |11349188| 8134473|2015-02-05 08:01:50| |11349187| 8132574|2015-02-05 09:09:07| +--------+-------------+-------------------+
当我甚至没有连接df1时,如何从原始df1获得额外的4行?
“LIMIT”功能是否以其他方式工作?