为什么isNull()
在以下情况下表现不同?
两个数据帧的定义:
df_t1 = sqlContext.sql("select 1 id, 9 num union all select 1 id, 2 num union all select 2 id, 3 num")
df_t2 = sqlContext.sql("select 1 id, 1 start, 3 stop union all select 3 id, 1 start, 9 stop")
情景1:
df_t1.join(df_t2, (df_t1.id == df_t2.id) & (df_t1.num >= df_t2.start) & (df_t1.num <= df_t2.stop), "left").select([df_t2.start, df_t2.start.isNull()]).show()
输出1:
+-----+-------------+
|start|isnull(start)|
+-----+-------------+
| null| false|
| 1| false|
| null| false|
+-----+-------------+
情景2:
df_new=df_t1.join(df_t2, (df_t1.id == df_t2.id) & (df_t1.num >= df_t2.start) & (df_t1.num <= df_t2.stop), "left")
输出2:
+-----+-------------+
|start|isnull(start)|
+-----+-------------+
| null| true|
| 1| false|
| null| true|
+-----+-------------+
情景3:
df_t1.join(df_t2, (df_t1.id == df_t2.id) & (df_t1.num >= df_t2.start) & (df_t1.num <= df_t2.stop), "left").filter("start is null").show()
输出3:
+---+---+----+-----+----+
| id|num| id|start|stop|
+---+---+----+-----+----+
| 1| 9|null| null|null|
| 2| 3|null| null|null|
+---+---+----+-----+----+
谢谢。