这是我的代码:
valuesA = [('Pirate',1),('Monkey',2),('Ninja',3),('Spaghetti',None)]
TableA = spark.createDataFrame(valuesA,['name','id'])
TableA.show()
+---------+----+
| name| id|
+---------+----+
| Pirate| 1|
| Monkey| 2|
| Ninja| 3|
|Spaghetti|null|
+---------+----+
TableA.where(TableA.id != 2).show()
+------+---+
| name| id|
+------+---+
|Pirate| 1|
| Ninja| 3|
+------+---+
为什么我无法在输出中获得Null值行?
我还收到以下警告:
19/11/20 16:54:22 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
19/11/20 16:54:22 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
19/11/20 16:54:23 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
答案 0 :(得分:1)
Spark SQL符合 ANSI SQL ,这意味着默认情况下,如果明确提及,则null
值只会被您的查询选择。
您需要将查询更改为:
TableA.where("id <> 2 or id is null").show()
+---------+----+
| name| id|
+---------+----+
| Pirate| 1|
| Ninja| 3|
|Spaghetti|null|
+---------+----+