Question

如果我在spark shell中运行一个spark程序，程序是否有可能在几个小时内占用整个hadoop集群？

通常有一个名为num-executors和executor-cores的设置。

spark-shell --driver-memory 10G --executor-memory 15G --executor-cores 8

但是如果没有指定它们我只是运行＆＃34; spark-shell＆＃34; ...它会消耗整个集群吗？或者是否有合理的默认值。

Answer 1

大多数配置属性的默认值都可以在Spark Configuration documentation中找到。对于示例中的配置属性，默认值为：

spark.driver.memory = 1g

spark.executor.memory = 1g

在YARN模式下spark.executor.cores = 1，工作站上的所有可用内核都处于独立模式。

此外，您可以通过使用所需属性创建文件$SPARK-HOME/conf/spark-defaults.conf来覆盖这些默认值（如here所述）。然后，如果文件存在所需的值，则不需要将它们作为参数传递给spark-shell命令。