Spark sql 1.6

时间:2018-03-01 05:06:51

标签: apache-spark apache-spark-sql spark-dataframe

我正在使用spark 1.6,并尝试通过关注这些博客https://docs.cloud.databricks.com/docs/latest/databricks_guide/04%20SQL,%20DataFrames%20&%20Datasets/09%20Cluster%20By.html来优化我的联接 https://blog.deepsense.ai/optimize-spark-with-distribute-by-and-cluster-by/使用DISTRIBUTE BY和CLUSTER BY,但遗憾的是它们不受支持。

我的spark sql查询是

sqlContext.sql(
      """select b.*, count(*) AS CNT  from tableb b
         GROUP BY b.Key,b.KeyVal
         CLUSTER BY b.Key,b.KeyVal
      """)

错误是

Exception in thread "main" java.lang.RuntimeException: [5.7] failure: ``union'' expected but identifier CLUSTER found

      CLUSTER BY b.Key

1 个答案:

答案 0 :(得分:0)

您应该使用hiveContext来使用cluster by并分发。