在Spark SQL查询中使用Scala列表

时间:2016-08-15 11:03:07

标签: sql scala apache-spark

我试图运行以下查询:

val IgnoreList = List(""," ","0","-","{}","()","[]","null","Null","NULL","false","False","FALSE","NA","na","Na","n/a","N/a","N/A","nil","Nil","NIL")
val df = sqlContext.sql(s"select userName from names where userName not in $IgnoreList")

但这不会奏效。我也尝试过:

val IgnoreList = List(""," ","0","-","{}","()","[]","null","Null","NULL","false","False","FALSE","NA","na","Na","n/a","N/a","N/A","nil","Nil","NIL")
sqlContext.udf.register("SqlList",(s: List[String]) => "('" + s.mkString("','") + "')")
val df = sqlContext.sql(s"select userName from names where userName not in SqlList($IgnoreList)")

但那也不会奏效。有什么建议吗?

1 个答案:

答案 0 :(得分:2)

您的第一次尝试失败,因为它调用了List默认的toString,它不会返回您需要的SQL有效语法。您的第二次尝试失败,因为使用UDF构建SQL字符串没有任何意义 - UDF将应用于记录(或列),而不是创建字符串查询。

您需要在第二个中完成格式化,并结合第一个:

中完成的普通字符串插值
val IgnoreList = List(""," ","0","-","{}","()","[]","null","Null","NULL","false","False","FALSE","NA","na","Na","n/a","N/a","N/A","nil","Nil","NIL")
val condition = "('" + IgnoreList.mkString("','") + "')"
val df = sqlContext.sql(s"select userName from names where userName not in $condition")
顺便说一下,用这种方式格式化列表可能更清楚:

IgnoreList.map(s => s"'$s'").mkString(",")