SparkSQL中Repartition和RepartitionByExpression之间究竟有什么区别

时间:2017-12-06 01:50:22

标签: apache-spark apache-spark-sql

  test("distribute above repartition") {
    // Always respects the top distribute and removes useless repartition
    val query1 = testRelation
      .repartition(10)
      .distribute('a)(20)
    val query2 = testRelation
      .repartition(30)
      .distribute('a)(20)

    val optimized1 = Optimize.execute(query1.analyze)
    val optimized2 = Optimize.execute(query2.analyze)
    val correctAnswer = testRelation.distribute('a)(20).analyze

    comparePlans(optimized1, correctAnswer)
    comparePlans(optimized2, correctAnswer)
  }

请看上面的内容,似乎distribute表示某些特定列的重新分区(即从我的角度来看分区键)。

我的问题是:

  • 仅调用repartition(无distribute)时,分区程序将使用哪个密钥?

0 个答案:

没有答案