我有计算每个列长度的任务,并将消息添加到" errorMsg"柱。我能够根据长度过滤记录,但不能在新列中附加消息。
例如。 我想在新列" ErrorMsg"
中找出仅包含消息的无效记录RECORDLENGTH = 4
InputDataFrame-
+------+
| value|
+------+
|Pra |
|Akshay|
| Raju|
|Shakti|
|xyz |
+------+
OutputDataFrame
+------+------------------------+
| value|ErrorMsg |
+------+------------------------+
|Pra |Less Than total Length
|Akshay|Greater than total length
|Shakti|Greater than total length
|xyx |Less than total length
+------+-------------------------
raju是我的真实记录,它会转到没有消息的有效记录。
答案 0 :(得分:1)
以下内容将获得理想的结果。
val df = Seq("Pra", "Akshay", "Raju", "Shakti", "xyz").toDF("value")
df
.filter(not(length($"value") === 4))
.withColumn("ErrorMsg", when(length($"value") > lit(4), "Greater than total length").otherwise("Less Than total Length"))
.show(10000, false)
+------+-------------------------+
|value |ErrorMsg |
+------+-------------------------+
|Pra |Less Than total Length |
|Akshay|Greater than total length|
|Shakti|Greater than total length|
|xyz |Less Than total Length |
+------+-------------------------+