如何在Scala运行时中了解不同的联接类型会引发

时间:2019-01-04 08:46:26

标签: scala apache-spark join apache-spark-sql

我想针对可用的Spark连接类型白名单测试用户输入。

是否可以通过内置的火花了解不同的联接类型?

例如,我想根据此Seq Seq("inner", "cross", "outer", "full", "fullouter", "left", "leftouter", "right", "rightouter", "leftsemi", "leftanti")

验证用户的输入

(Spark中所有可用的联接类型)都没有像我刚才那样对其进行硬编码。

2 个答案:

答案 0 :(得分:4)

我修改了问题here的答案。您还可以在Json文件中添加joinTypes以在runtume中读取。您可以检查此答案以获取json对象处理JsonParsing

更新1:我更新了答案,以遵循Spark文档方式JoinType

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._


object SparkSandbox extends App {

  case class Row(id: Int, value: String)

  private[this] implicit val spark = SparkSession.builder().master("local[*]").getOrCreate()

  import spark.implicits._

  spark.sparkContext.setLogLevel("ERROR")

  val r1 = Seq(Row(1, "A1"), Row(2, "A2"), Row(3, "A3"), Row(4, "A4")).toDS()
  val r2 = Seq(Row(3, "A3"), Row(4, "A4"), Row(4, "A4_1"), Row(5, "A5"), Row(6, "A6")).toDS()
  val validUserJoinType = "inner"
  val inValiedUserJoinType = "nothing"

  val joinTypes = Seq("inner", "outer", "full", "full_outer", "left", "left_outer", "right", "right_outer", "left_semi", "left_anti")

  inValiedUserJoinType match {
    case x => if (joinTypes.contains(x)) {
      println("do some logic")
      joinTypes foreach { joinType =>
        println(s"${joinType.toUpperCase()} JOIN")
        r1.join(right = r2, usingColumns = Seq("id"), joinType = joinType).orderBy("id").show()
      }
    }
    case _ =>
  val supported = Seq(
    "inner",
    "outer", "full", "fullouter", "full_outer",
    "leftouter", "left", "left_outer",
    "rightouter", "right", "right_outer",
    "leftsemi", "left_semi",
    "leftanti", "left_anti",
    "cross")

  throw new IllegalArgumentException(s"Unsupported join type '$inValiedUserJoinType'. " +
  "Supported join types include: " + supported.mkString("'", "', '", "'") + ".")
  }

}

答案 1 :(得分:2)

对不起,如果没有PR到Spark项目本身,这是不可能的。连接类型是在JoinType内联定义的。有些类扩展了JoinType,但是命名约定与case语句中使用的字符串的约定不同。所以恐怕你不走运。

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala