Question

我试图在一个包含1500个动态分区的表中插入一些数据并收到此错误：

 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Number of dynamic partitions created is 1500, which is more than 1000. 
 To solve this try to set hive.exec.max.dynamic.partitions to at least 1500.

所以，我尝试：SET hive.exec.max.dynamic.partitions=2048但我仍然得到同样的错误。

如何从Spark更改此值？

代码：

this.spark.sql("SET hive.exec.dynamic.partition=true")
this.spark.sql("set hive.exec.dynamic.partition.mode=nonstrict")
this.spark.sql("SET hive.exec.max.dynamic.partitions=2048")
this.spark.sql(
    """
      |INSERT INTO processed_data
      |PARTITION(event, date)
      |SELECT c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,event,date FROM csv_data DISTRIBUTE BY event, date
    """.stripMargin
).show()

使用Spark 2.0.0独立模式。谢谢！

Answer 1

从spark 2.x版本，在 Spark CLI 中添加配置单元属性可能不起作用。请在您的spark和hive conf目录的 hive-site.xml 中添加您的配置单元属性。

在 hive-site.xml 文件中添加以下属性可以解决您的问题。

<name>hive.exec.max.dynamic.partitions</name>
<value>2048</value>
<description></description>

注意：重新启动hiveserver2和spark历史记录服务器，如果它不起作用。

无法更改Spark

1 个答案: