如何提高纱线应用平行度

时间:2019-07-11 13:48:09

标签: pyspark yarn amazon-emr

我试图在EMR Spark上运行多个纱线应用程序,但一次不能运行5个以上的应用程序。

我正在为Spark Cluster使用以下配置:

Master = r5.2xlarge

工人= r5.12xlarge 384 GB内存 48个虚拟核心 部署模式=集群

JSON

{
        "Classification":"spark-defaults",
        "ConfigurationProperties":{
          "spark.executor.extraJavaOptions": "-XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'",
          "spark.driver.extraJavaOptions": "-XX:+UseG1GC -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:InitiatingHeapOccupancyPercent=35 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'",
          "spark.scheduler.mode":"FIFO",
          "spark.eventLog.enabled":"true",
          "spark.serializer":"org.apache.spark.serializer.KryoSerializer",
          "spark.dynamicAllocation.enabled":"false",
          "spark.executor.heartbeatInterval":"60s",
          "spark.network.timeout": "800s",
          "spark.executor.cores": "5",
          "spark.driver.cores": "5",
          "spark.executor.memory": "37000M",
          "spark.driver.memory": "37000M",
          "spark.yarn.executor.memoryOverhead": "5000M",
          "spark.yarn.driver.memoryOverhead": "5000M",
          "spark.executor.instances": "17",
          "spark.default.parallelism": "170",
          "spark.yarn.scheduler.reporterThread.maxFailures": "5",
          "spark.storage.level": "MEMORY_AND_DISK_SER",
          "spark.rdd.compress": "true",
          "spark.shuffle.compress": "true",
          "spark.shuffle.spill.compress": "true"
        }
      }

如何在EMR Spark中增加并行运行的纱线应用程序的数量?

1 个答案:

答案 0 :(得分:0)

看看在集群的主节点上运行的Yarn ui。集群中是否已使用了所有CPU和所有内存?并发性提高通常意味着正在运行的每个单独的应用程序只能使用群集的一小部分。另外,由于您已禁用动态执行程序分配并将执行程序的数量设置为17,因此您一次只能运行一个Spark应用程序。