我有一个Spark DataFrame,其中包含我使用Likert量表与数字分数匹配的字符串。不同的问题Ids映射到不同的分数。我试图在Apache Spark udf中对Scala中的一个范围进行模式匹配,使用这个问题作为指导:
How can I pattern match on a range in Scala?
但是当我使用范围而不是简单的OR语句时,我收到了编译错误, 即。
31 | 32 | 33 | 34
工作正常
31 to 35
无法编译。请问我在语法上出错了吗?
另外,在最后的情况_中,我想要映射到String而不是Int,
case _ => "None"
但这会产生错误:
java.lang.UnsupportedOperationException: Schema for type Any is not supported
据推测,这是Spark的一般性问题,因为在本机Scala中返回Any
是完全可能的吗?
这是我的代码:
def calculateScore = udf((questionId: Int, answerText: String) => (questionId, answerText) match {
case ((31 | 32 | 33 | 34 | 35), "Rarely /<br>Never") => 4 //this is fine
case ((31 | 32 | 33 | 34 | 35), "Occasionally") => 3
case ((31 | 32 | 33 | 34 | 35), "Often") => 2
case ((31 | 32 | 33 | 34 | 35), "Almost always /<br>Always") => 1
case ((x if 41 until 55 contains x), "None of the time") => 1 //this line won't compile
case _ => 0 //would like to map to "None"
})
然后在Spark DataFrame上使用udf,如下所示:
val df3 = df.withColumn("NumericScore", calculateScore(df("QuestionId"), df("AnswerText")))
答案 0 :(得分:2)
保护表达式应放在模式之后:
def calculateScore = udf((questionId: Int, answerText: String) => (questionId, answerText) match {
case ((31 | 32 | 33 | 34 | 35), "Rarely /<br>Never") => 4
case ((31 | 32 | 33 | 34 | 35), "Occasionally") => 3
case ((31 | 32 | 33 | 34 | 35), "Often") => 2
case ((31 | 32 | 33 | 34 | 35), "Almost always /<br>Always") => 1
case (x, "None of the time") if 41 until 55 contains x => 1
case _ => 0 //would like to map to "None"
})
答案 1 :(得分:2)
如果您想将最后case
,即case _
映射到“无”String
,那么所有案例都应该返回String
以下udf
功能应该适合您
def calculateScore = udf((questionId: Int, answerText: String) => (questionId, answerText) match {
case ((31 | 32 | 33 | 34 | 35), "Rarely /<br>Never") => "4" //this is fine
case ((31 | 32 | 33 | 34 | 35), "Occasionally") => "3"
case ((31 | 32 | 33 | 34 | 35), "Often") => "2"
case ((31 | 32 | 33 | 34 | 35), "Almost always /<br>Always") => "1"
case (x, "None of the time") if (x >= 41 && x < 55) => "1" //this line won't compile
case _ => "None"
})
如果您要将最后case
即case _
映射到None
,则需要将其他返回类型更改为Option
的子项{{1} }}是None
以下代码也适用于您
Option
最后一点是,您def calculateScore = udf((questionId: Int, answerText: String) => (questionId, answerText) match {
case ((31 | 32 | 33 | 34 | 35), "Rarely /<br>Never") => Some(4) //this is fine
case ((31 | 32 | 33 | 34 | 35), "Occasionally") => Some(3)
case ((31 | 32 | 33 | 34 | 35), "Often") => Some(2)
case ((31 | 32 | 33 | 34 | 35), "Almost always /<br>Always") => Some(1)
case (x, "None of the time") if (x >= 41 && x < 55) => Some(1) //this line won't compile
case _ => None
})
的错误消息明确指出不支持返回类型为java.lang.UnsupportedOperationException: Schema for type Any is not supported
的{{1}}函数。 <{1}}中的所有udf
都应保持一致。