在Spark SQL中加入140亿记录表(大小:2 TB)和20亿表(10 GB)时出错

时间:2017-01-17 12:44:23

标签: apache-spark apache-spark-sql spark-dataframe

我在Spark中加入(内部联接)2TB表和10 GB表(它的维度表和连接列是唯一的)时遇到问题。

我的火花配置是:

  1. 执行者记忆:30 GB
  2. 执行人数:100
  3. 每个遗嘱执行人的核心数:4
  4. 我在加入时遇到错误:

      

    应用程序超出了物理内存限制。目前的用法:   使用9.0 GB的9 GB物理内存;使用10.9 GB的18.9 GB虚拟内存。杀死容器。

    如果有些人可以帮我制定一个很好的策略来做这么大的联接,那将会非常有帮助。

    Application application_1484081373244_3987 failed 2 times due to AM Container for appattempt_1484081373244_3987_000002 exited with exitCode: -104
    For more detailed output, check application tracking page:https://cdts1hdpnn01d.rxcorp.com:8090/proxy/application_1484081373244_3987/Then, click on links to logs of each attempt.
    Diagnostics: Container [pid=17580,containerID=container_e75_1484081373244_3987_02_000001] is running beyond physical memory limits. Current usage: 9.5 GB of 9 GB physical memory used; 11.7 GB of 18.9 GB virtual memory used. Killing container.
    Dump of the process-tree for container_e75_1484081373244_3987_02_000001 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 17580 17575 17580 17580 (bash) 3 0 109068288 392 /bin/bash -c LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.8.3-1.cdh5.8.3.p2004.2082/lib/hadoop/../../../CDH-5.8.3-1.cdh5.8.3.p2004.2082/lib/hadoop/lib/native: /usr/java/latest/bin/java -server -Xmx8192m -Djava.io.tmpdir=/data09/yarn/nm/usercache/clsuusr/appcache/application_1484081373244_3987/container_e75_1484081373244_3987_02_000001/tmp '-Dlog4j.configuration=file:logs.properties' -Dspark.yarn.app.container.log.dir=/data07/yarn/container/application_1484081373244_3987/container_e75_1484081373244_3987_02_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'TransTableCreation' --jar file:/storage/acceptance/cls/data01/CustomTransTable-1.0-SNAPSHOT.jar --executor-memory 20480m --executor-cores 4 --properties-file /data09/yarn/nm/usercache/clsuusr/appcache/application_1484081373244_3987/container_e75_1484081373244_3987_02_000001/__spark_conf__/__spark_conf__.properties 1> /data07/yarn/container/application_1484081373244_3987/container_e75_1484081373244_3987_02_000001/stdout 2> /data07/yarn/container/application_1484081373244_3987/container_e75_1484081373244_3987_02_000001/stderr
    |- 17589 17580 17580 17580 (java) 45896 1279 12430073856 2477874 /u
    

0 个答案:

没有答案