纱线上的火花会产生不均匀性

时间:2017-12-15 15:03:19

标签: apache-spark yarn

我有火花(在hadoop 2.7上运行2.2)作业并且必须重新启动sparkmaster机器。现在,纱线上的火花工作已经提交,已接受并正在运行但不会结束。

群集(1 + 3个节点)。 Resourcemanager& Namenode在sparkmaster节点上运行。 Nodemanager和Datanode在3个工作节点上运行。

执行者日志:

/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/12/15 08:58:02 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: 130256@cassandralake1node3.localdomain     
17/12/15 08:58:02 INFO util.SignalUtils: Registered signal handler for TERM    
17/12/15 08:58:02 INFO util.SignalUtils: Registered signal handler for HUP    
17/12/15 08:58:02 INFO util.SignalUtils: Registered signal handler for INT    
17/12/15 08:58:03 WARN util.Utils: Your hostname, cassandralake1node3.localdomain resolves to a loopback address: 127.0.0.1; using 10.204.211.105 instead (on interface em1)
17/12/15 08:58:03 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/12/15 08:58:03 INFO spark.SecurityManager: Changing view acls to: root
17/12/15 08:58:03 INFO spark.SecurityManager: Changing modify acls to: root
17/12/15 08:58:03 INFO spark.SecurityManager: Changing view acls groups to: 
17/12/15 08:58:03 INFO spark.SecurityManager: Changing modify acls groups to: 
17/12/15 08:58:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
17/12/15 08:58:03 INFO client.TransportClientFactory: Successfully created connection to /10.204.211.105:40866 after 85 ms (0 ms spent in bootstraps)
17/12/15 08:58:04 INFO spark.SecurityManager: Changing view acls to: root
17/12/15 08:58:04 INFO spark.SecurityManager: Changing modify acls to: root
17/12/15 08:58:04 INFO spark.SecurityManager: Changing view acls groups to: 
17/12/15 08:58:04 INFO spark.SecurityManager: Changing modify acls groups to: 
17/12/15 08:58:04 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
17/12/15 08:58:04 INFO client.TransportClientFactory: Successfully created connection to /10.204.211.105:40866 after 1 ms (0 ms spent in bootstraps)
17/12/15 08:58:04 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1513329182871_0010/blockmgr-15ae52df-c267-427e-b8f1-ef1c84059740
17/12/15 08:58:04 INFO memory.MemoryStore: MemoryStore started with capacity 1311.0 MB
17/12/15 08:58:04 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.204.211.105:40866
17/12/15 08:58:04 INFO executor.CoarseGrainedExecutorBackend: Successfully registered with driver
17/12/15 08:58:04 INFO executor.Executor: Starting executor ID 1 on host cassandranode3
17/12/15 08:58:04 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35983.
17/12/15 08:58:04 INFO netty.NettyBlockTransferService: Server created on cassandranode3:35983
17/12/15 08:58:04 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/12/15 08:58:04 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(1, cassandranode3, 35983, None)
17/12/15 08:58:04 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(1, cassandranode3, 35983, None)
17/12/15 08:58:04 INFO storage.BlockManager: external shuffle service port = 7337
17/12/15 08:58:04 INFO storage.BlockManager: Registering executor with local external shuffle service.
17/12/15 08:58:04 INFO client.TransportClientFactory: Successfully created connection to cassandranode3/10.204.211.105:7337 after 1 ms (0 ms spent in bootstraps)
17/12/15 08:58:04 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(1, cassandranode3, 35983, None)

驱动程序日志:

O util.Utils: Using initial executors = 2, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
17/12/15 09:50:06 INFO yarn.YarnAllocator: Will request 2 executor container(s), each with 1 core(s) and 3072 MB memory (including 1024 MB of overhead)
17/12/15 09:50:06 INFO yarn.YarnAllocator: Submitted 2 unlocalized container requests.
17/12/15 09:50:06 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
17/12/15 09:50:07 INFO impl.AMRMClientImpl: Received new token for : cassandranode2:38628
17/12/15 09:50:07 INFO impl.AMRMClientImpl: Received new token for : cassandranode3:39212
17/12/15 09:50:07 INFO yarn.YarnAllocator: Launching container container_1513329182871_0011_01_000002 on host cassandranode2 for executor with ID 1
17/12/15 09:50:07 INFO yarn.YarnAllocator: Launching container container_1513329182871_0011_01_000003 on host cassandranode3 for executor with ID 2
17/12/15 09:50:07 INFO yarn.YarnAllocator: Received 2 containers from YARN, launching executors on 2 of them.
17/12/15 09:50:07 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
17/12/15 09:50:07 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
17/12/15 09:50:07 INFO impl.ContainerManagementProtocolProxy: Opening proxy : cassandranode3:39212
17/12/15 09:50:07 INFO impl.ContainerManagementProtocolProxy: Opening proxy : cassandranode2:38628
17/12/15 09:50:09 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.204.211.105:47622) with ID 2
17/12/15 09:50:09 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 1)
17/12/15 09:50:09 INFO storage.BlockManagerMasterEndpoint: Registering block manager cassandranode3:33779 with 1311.0 MB RAM, BlockManagerId(2, cassandranode3, 33779, None)
17/12/15 09:50:11 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.204.211.103:43578) with ID 1
17/12/15 09:50:11 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 2)
17/12/15 09:50:11 INFO storage.BlockManagerMasterEndpoint: Registering block manager cassandranode2:37931 with 1311.0 MB RAM, BlockManagerId(1, cassandranode2, 37931, None)
17/12/15 09:50:11 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
17/12/15 09:50:11 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
17/12/15 09:50:11 INFO internal.SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1513329182871_0011/container_1513329182871_0011_01_000001/spark-warehouse').
17/12/15 09:50:11 INFO internal.SharedState: Warehouse path is 'file:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1513329182871_0011/container_1513329182871_0011_01_000001/spark-warehouse'.
17/12/15 09:50:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e087bd4{/SQL,null,AVAILABLE,@Spark}
17/12/15 09:50:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@c93af1f{/SQL/json,null,AVAILABLE,@Spark}
17/12/15 09:50:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53fd3a5d{/SQL/execution,null,AVAILABLE,@Spark}
17/12/15 09:50:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7dcd6778{/SQL/execution/json,null,AVAILABLE,@Spark}
17/12/15 09:50:11 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a25ecc9{/static/sql,null,AVAILABLE,@Spark}
17/12/15 09:50:12 INFO state.StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
17/12/15 09:51:09 INFO spark.ExecutorAllocationManager: Request to remove executorIds: 2
17/12/15 09:51:11 INFO spark.ExecutorAllocationManager: Request to remove executorIds: 1

火花default.conf

spark.master                      yarn
spark.eventLog.enabled            true
spark.eventLog.dir                file:///home/sparkeventlogs
spark.serializer                  org.apache.spark.serializer.KryoSerializer
spark.driver.memory               5g
spark.driver.cores                1
spark.yarn.am.memory              2048m
spark.yarn.am.cores               1
spark.submit.deployMode           cluster
spark.dynamicAllocation.enabled   true
spark.shuffle.service.enabled     true
spark.driver.maxResultSize        20g
spark.jars.packages               datastax:spark-cassandra-connector:2.0.5-s_2.11
spark.cassandra.connection.host   10.204.211.101,10.204.211.103,10.204.211.105
spark.executor.extraJavaOptions   -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
spark.driver.extraJavaOptions     -Dhdp.version=2.7.4
spark.cassandra.read.timeout_ms   180000
spark.yarn.stagingDir            hdfs:///tmp
spark.network.timeout             2400
spark.yarn.driver.memoryOverhead  2048
spark.yarn.executor.memoryOverhead 1024
spark.network.timeout             2400
yarn.resourcemanager.app.timeout.minutes=-1
spark.yarn.submit.waitAppCompletion true 
spark.sql.inMemoryColumnarStorage.compressed true
spark.sql.inMemoryColumnarStorage.batchSize 10000

Spark Submit命令:

spark-submit --class com.swcassandrautil.popstatsclone.popihits --master yarn --deploy-mode cluster --executor-cores 1 --executor-memory 2g --conf spark.dynamicAllocation.initialExecutors=2 --conf spark.dynamicAllocation.maxExecutors=8 --conf spark.dynamicAllocation.minExecutors=2 --conf spark.memory.fraction=0.75 --conf spark.memory.storageFraction=0.75 /scala/statscloneihits/target/scala-2.11/popstatscloneihits_2.11-1.0.jar "/mnt/data/tmp/xyz*" "\t";

请求您的输入并感谢。

由于

0 个答案:

没有答案