我正在使用spark 1.6,并尝试通过关注这些博客https://docs.cloud.databricks.com/docs/latest/databricks_guide/04%20SQL,%20DataFrames%20&%20Datasets/09%20Cluster%20By.html来优化我的联接 https://blog.deepsense.ai/optimize-spark-with-distribute-by-and-cluster-by/使用DISTRIBUTE BY和CLUSTER BY,但遗憾的是它们不受支持。
我的spark sql查询是
sqlContext.sql(
"""select b.*, count(*) AS CNT from tableb b
GROUP BY b.Key,b.KeyVal
CLUSTER BY b.Key,b.KeyVal
""")
错误是
Exception in thread "main" java.lang.RuntimeException: [5.7] failure: ``union'' expected but identifier CLUSTER found
CLUSTER BY b.Key
答案 0 :(得分:0)
您应该使用hiveContext来使用cluster by并分发。