我不明白这里出了什么问题。
ERROR 我所拥有的是And(&&)运算符无效,所有内容都被定向到其他地方。如果我不使用And(&&)操作符,那么一些if条件就可以了。请查看下面的年龄和 ageGroup 列,将它们与UDF声明进行比较。 6岁和7岁是成年人,20岁是小孩?
这是我的代码:
所有Spark导入和初始化
import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.sql.types._
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions.udf
case class Person(name: String, address:String, state, age:Int, phone:Int, order:String)
val df = Seq(
("adnan", "migi way", "texas", 10, 333, "AX-1"),
("dim", "gigi way", "utah", 6,222, "AX-2"),
("alvee", "sigi way", "utah", 9,222, "AX-2"),
("john", "higi way", "georgia", 20,111, "AX- 3")).toDF("name","address","state","age","phone", "order")
val df1 = datafile.map(_.split("\\|")).map(attr => Person(attr(0).toString, attr(1).toString, attr(2).toString, attr(3).toInt, attr(4).toInt, attr(5).toString)).toDF()
下面的UDF代码
def ageFilter = udf((age: Int) => {
if (age >= 2 && age <= 9) "bacha"
if (age >= 10 ) "kiddo"
else "adult"
})
调用UDF
val one_hh_ages = df1.withColumn("ageGroup", ageFilter($"age"))
这是我从以下地方获得帮助的地方: Apache Spark, add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame
答案 0 :(得分:0)
问题是你的UDF中的第一个条件没有效果,因为此时函数没有返回,而是继续下一个if语句。
您可以使用else if
def ageFilter = udf((age: Int) => {
if (age >= 2 && age <= 9) "bacha"
else if (age >= 10 ) "kiddo"
else "adult"
})
或与模式匹配:
def ageFilter = udf((age: Int) => {
age match {
case age if age >=2 && age <=9 => "bacha"
case age if age >=10 => "kiddo"
case default => "adult"
}
})
但是你应该检查一下你的逻辑条件(10岁以上是孩子吗?小于2岁是成人吗?)