Question

我试图运行以下查询：

val IgnoreList = List(""," ","0","-","{}","()","[]","null","Null","NULL","false","False","FALSE","NA","na","Na","n/a","N/a","N/A","nil","Nil","NIL")
val df = sqlContext.sql(s"select userName from names where userName not in $IgnoreList")

但这不会奏效。我也尝试过：

val IgnoreList = List(""," ","0","-","{}","()","[]","null","Null","NULL","false","False","FALSE","NA","na","Na","n/a","N/a","N/A","nil","Nil","NIL")
sqlContext.udf.register("SqlList",(s: List[String]) => "('" + s.mkString("','") + "')")
val df = sqlContext.sql(s"select userName from names where userName not in SqlList($IgnoreList)")

但那也不会奏效。有什么建议吗？

Answer 1

您的第一次尝试失败，因为它调用了List默认的toString，它不会返回您需要的SQL有效语法。您的第二次尝试失败，因为使用UDF构建SQL字符串没有任何意义 - UDF将应用于记录（或列），而不是创建字符串查询。

您需要在第二个中完成格式化，并结合第一个：

中完成的普通字符串插值

val IgnoreList = List(""," ","0","-","{}","()","[]","null","Null","NULL","false","False","FALSE","NA","na","Na","n/a","N/a","N/A","nil","Nil","NIL")
val condition = "('" + IgnoreList.mkString("','") + "')"
val df = sqlContext.sql(s"select userName from names where userName not in $condition")

顺便说一下，用这种方式格式化列表可能更清楚：

IgnoreList.map(s => s"'$s'").mkString(",")

在Spark SQL查询中使用Scala列表

1 个答案: