我想针对可用的Spark连接类型白名单测试用户输入。
是否可以通过内置的火花了解不同的联接类型?
例如,我想根据此Seq
Seq("inner", "cross", "outer", "full", "fullouter", "left", "leftouter", "right", "rightouter", "leftsemi", "leftanti")
(Spark中所有可用的联接类型)都没有像我刚才那样对其进行硬编码。
答案 0 :(得分:4)
我修改了问题here的答案。您还可以在Json文件中添加joinTypes以在runtume中读取。您可以检查此答案以获取json对象处理JsonParsing
更新1:我更新了答案,以遵循Spark文档方式JoinType
import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._
object SparkSandbox extends App {
case class Row(id: Int, value: String)
private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()
import spark.implicits._
spark.sparkContext.setLogLevel("ERROR")
val r1 = Seq(Row(1, "A1"), Row(2, "A2"), Row(3, "A3"), Row(4, "A4")).toDS()
val r2 = Seq(Row(3, "A3"), Row(4, "A4"), Row(4, "A4_1"), Row(5, "A5"), Row(6, "A6")).toDS()
val validUserJoinType = "inner"
val inValiedUserJoinType = "nothing"
val joinTypes = Seq("inner", "outer", "full", "full_outer", "left", "left_outer", "right", "right_outer", "left_semi", "left_anti")
inValiedUserJoinType match {
case x => if (joinTypes.contains(x)) {
println("do some logic")
joinTypes foreach { joinType =>
println(s"${joinType.toUpperCase()} JOIN")
r1.join(right = r2, usingColumns = Seq("id"), joinType = joinType).orderBy("id").show()
}
}
case _ =>
val supported = Seq(
"inner",
"outer", "full", "fullouter", "full_outer",
"leftouter", "left", "left_outer",
"rightouter", "right", "right_outer",
"leftsemi", "left_semi",
"leftanti", "left_anti",
"cross")
throw new IllegalArgumentException(s"Unsupported join type '$inValiedUserJoinType'. " +
"Supported join types include: " + supported.mkString("'", "', '", "'") + ".")
}
}
答案 1 :(得分:2)
对不起,如果没有PR到Spark项目本身,这是不可能的。连接类型是在JoinType内联定义的。有些类扩展了JoinType,但是命名约定与case语句中使用的字符串的约定不同。所以恐怕你不走运。