%spark.pyspark
l = [('user1', 33, 1.0, 'chess'), ('user2', 34, 2.0, 'tenis'), ('user3', None, None, ''), ('user4', None, 4.0, ' '), ('user5', None, 5.0, 'ski')]
df = spark.createDataFrame(l, ['name', 'age', 'ratio', 'hobby'])
df.show()
root
|-- name: string (nullable = true)
|-- age: long (nullable = true)
|-- ratio: double (nullable = true)
|-- hobby: string (nullable = true)
+-----+----+-----+-----+
| name| age|ratio|hobby|
+-----+----+-----+-----+
|user1| 33| 1.0|chess|
|user2| 34| 2.0|tenis|
|user3|null| null| |
|user4|null| 4.0| |
|user5|null| 5.0| ski|
+-----+----+-----+-----+
当字段值为空或len(field.stripe('\ t'))== 0时,我想将字符串字段列替换为null。在我的情况下,'hobby'列空插槽应替换为空值。任何提示?
答案 0 :(得分:0)
您可以将空bu null填充为
df.withColumn("hobby", blank_as_null("hobby"))
用于检查len(field.stripe(' \t')) == 0
你可以使用UDF
def replace(column, value):
return when(len(column.stripe(' \t')) == 0, column).otherwise(lit(None))
df.withColumn("y", replace(col("y"), null)).show()