使用scala

时间:2017-07-31 07:33:42

标签: scala apache-spark

我有计算每个列长度的任务,并将消息添加到" errorMsg"柱。我能够根据长度过滤记录,但不能在新列中附加消息。

例如。 我想在新列" ErrorMsg"

中找出仅包含消息的无效记录

RECORDLENGTH = 4

InputDataFrame-             
+------+
| value|
+------+
|Pra   |
|Akshay|
|  Raju|
|Shakti|
|xyz   |
+------+

OutputDataFrame

+------+------------------------+
| value|ErrorMsg                |
+------+------------------------+
|Pra   |Less Than total Length
|Akshay|Greater than total length 
|Shakti|Greater than total length
|xyx   |Less than total length
+------+-------------------------

raju是我的真实记录,它会转到没有消息的有效记录。

1 个答案:

答案 0 :(得分:1)

以下内容将获得理想的结果。

val df = Seq("Pra", "Akshay", "Raju", "Shakti", "xyz").toDF("value")
df
 .filter(not(length($"value") === 4))
 .withColumn("ErrorMsg", when(length($"value") > lit(4), "Greater than total length").otherwise("Less Than total Length"))
 .show(10000, false)

+------+-------------------------+
|value |ErrorMsg                 |
+------+-------------------------+
|Pra   |Less Than total Length   |
|Akshay|Greater than total length|
|Shakti|Greater than total length|
|xyz   |Less Than total Length   |
+------+-------------------------+