当我在列上使用where条件时,无法获得空声明

时间:2019-11-20 16:57:38

标签: pyspark

这是我的代码:

valuesA = [('Pirate',1),('Monkey',2),('Ninja',3),('Spaghetti',None)]
TableA = spark.createDataFrame(valuesA,['name','id'])

TableA.show()
+---------+----+
|     name|  id|
+---------+----+
|   Pirate|   1|
|   Monkey|   2|
|    Ninja|   3|
|Spaghetti|null|
+---------+----+

TableA.where(TableA.id != 2).show()
+------+---+
|  name| id|
+------+---+
|Pirate|  1|
| Ninja|  3|
+------+---+

为什么我无法在输出中获得Null值行?

我还收到以下警告:

19/11/20 16:54:22 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
19/11/20 16:54:22 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
19/11/20 16:54:23 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException

1 个答案:

答案 0 :(得分:1)

Spark SQL符合 ANSI SQL ,这意味着默认情况下,如果明确提及,则null值只会被您的查询选择。 您需要将查询更改为:

TableA.where("id <> 2 or id is null").show()
+---------+----+
|     name|  id|
+---------+----+
|   Pirate|   1|
|    Ninja|   3|
|Spaghetti|null|
+---------+----+