我使用以下命令在hive-site.xml中配置了spark引擎:
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
I configured spark engine in hive-site.xml using:
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.master</name>
<value>yarn-cluster</value>
</property>
<property>
<name>spark.dynamicAllocation.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>4</value>
</property>
<property>
<name>spark.dynamicAllocation.initialExecutors</name>
<value>1</value>
</property>
<property>
<name>spark.dynamicAllocation.minExecutors</name>
<value>1</value>
</property>
<property>
<name>spark.dynamicAllocation.maxExecutors</name>
<value>8</value>
</property>
<property>
<name>spark.shuffle.service.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>3g</value>
</property>
<property>
<name>spark.driver.memory</name>
<value>3g</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.io.compression.codec</name>
<value>lzf</value>
</property>
<property>
<name>spark.yarn.jar</name>
<value>hdfs://VCluster1/user/spark/share/lib/spark-assembly-1.3.1-hadoop2.7.1.jar</value>
</property>
<property>
<name>spark.kryo.referenceTracking</name>
<value>false</value>
</property>
<property>
<name>spark.kryo.classesToRegister</name>
<value>org.apache.hadoop.hive.ql.io.HiveKey,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch</value>
</property>
In yarn-site.xml:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
当我在火花作业上运行配置单元时,动态分配不起作用。 Spark会自动将spark.executor.instances分配给我设置为spark.dynamicAllocation.initialExecutors的数字,而不是更改。任何人都可以帮我解决问题吗?
谢谢