即使在后台运行后,Spark Streaming作业也会在3-4小时后自动被杀死

时间:2016-08-11 10:57:31

标签: spark-streaming

运行kafka-spark流式集成以实时获取数据。 代码:

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

#set auto.offset.reset = smallest

sc = SparkContext(appName="PythonStreamingDirectKafka")
ssc = StreamingContext(sc, 3600)

brokers = *****
topic = ******

kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers})
lines = kvs.map(lambda x: x[1])

lines.pprint()

lines.saveAsTextFiles('/tmp/')

ssc.start()
ssc.awaitTermination()

使用此命令在后台运行作业:

/ usr / bin / spark-submit --packages org.apache.spark:spark-streaming-kafka_2.10:1.6.2 --master yarn get_stream.py> stream.log 2>& 1&

这是从spark-stream作业生成的stream.log。这项工作将在3-4小时后自动关闭。 我在TRACE模式日志记录中遇到的错误是(不显示整个日志,它太大了):

    16/08/11 09:56:09 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook
    16/08/11 09:56:09 DEBUG JobScheduler: Stopping JobScheduler
    16/08/11 09:56:09 INFO JobGenerator: Stopping JobGenerator immediately
    16/08/11 09:56:09 INFO RecurringTimer: Stopped timer for JobGenerator after time 1470906000000
    16/08/11 09:56:09 INFO JobGenerator: Stopped JobGenerator
    16/08/11 09:56:09 DEBUG JobScheduler: Stopping job executor
    16/08/11 09:56:09 DEBUG JobScheduler: Stopped job executor
    16/08/11 09:56:09 INFO JobScheduler: Stopped JobScheduler
    16/08/11 09:56:09 INFO StreamingContext: StreamingContext stopped successfully
    16/08/11 09:56:09 INFO SparkContext: Invoking stop() from shutdown hook
    16/08/11 09:56:09 DEBUG DFSClient: DFSClient writeChunk allocating new packet seqno=29, src=/var/log/spark/apps/application_1470897979038_0002.inprogress, packetSize=65016, chunksPerPacket=126, bytesCurBlock=64512
    16/08/11 09:56:09 DEBUG DFSClient: DFSClient flush(): bytesCurBlock=64944 lastFlushOffset=64878 createNewBlock=false
    16/08/11 09:56:09 DEBUG DFSClient: Queued packet 29
    16/08/11 09:56:09 DEBUG DFSClient: Waiting for ack for: 29
    16/08/11 09:56:09 TRACE Tracer: setting current span null
    16/08/11 09:56:09 DEBUG DFSClient: DataStreamer block BP-730701491-10.102.224.120-1470897963878:blk_1073741871_1047 sending packet packet seqno: 29 offsetInBlock: 64512 lastPacketInBlock: false lastByteOffsetInBlock: 64944
    16/08/11 09:56:09 DEBUG DFSClient: DFSClient seqno: 29 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
    16/08/11 09:56:09 INFO SparkUI: Stopped Spark web UI at http://10.102.224.120:4040
    16/08/11 09:56:09 DEBUG DFSClient: DFSClient writeChunk allocating new packet seqno=30, src=/var/log/spark/apps/application_1470897979038_0002.inprogress, packetSize=65016, chunksPerPacket=126, bytesCurBlock=64512
    16/08/11 09:56:09 DEBUG DFSClient: Queued packet 30
    16/08/11 09:56:09 DEBUG DFSClient: Queued packet 31
    16/08/11 09:56:09 DEBUG DFSClient: Waiting for ack for: 31
    16/08/11 09:56:09 TRACE Tracer: setting current span null
    16/08/11 09:56:09 DEBUG DFSClient: DataStreamer block BP-730701491-10.102.224.120-1470897963878:blk_1073741871_1047 sending packet packet seqno: 30 offsetInBlock: 64512 lastPacketInBlock: false lastByteOffsetInBlock: 64944
    16/08/11 09:56:09 DEBUG DFSClient: DFSClient seqno: 30 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
    16/08/11 09:56:09 TRACE Tracer: setting current span null
    16/08/11 09:56:09 DEBUG DFSClient: DataStreamer block BP-730701491-10.102.224.120-1470897963878:blk_1073741871_1047 sending packet packet seqno: 31 offsetInBlock: 64944 lastPacketInBlock: true lastByteOffsetInBlock: 64944
    16/08/11 09:56:09 DEBUG DFSClient: DFSClient seqno: 31 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
    16/08/11 09:56:09 DEBUG DFSClient: Closing old block BP-730701491-10.102.224.120-1470897963878:blk_1073741871_1047
    16/08/11 09:56:09 TRACE ProtobufRpcEngine: 46: Call -> ip-10-102-224-120.ec2.internal/10.102.224.120:8020: complete {src: "/var/log/spark/apps/application_1470897979038_0002.inprogress" clientName: "DFSClient_NONMAPREDUCE_258672080_15" last { poolId: "BP-730701491-10.102.224.120-1470897963878" blockId: 1073741871 generationStamp: 1047 numBytes: 64944 } fileId: 16590}
    16/08/11 09:56:09 DEBUG Client: The ping interval is 60000 ms.
    16/08/11 09:56:09 DEBUG Client: Connecting to ip-10-102-224-120.ec2.internal/10.102.224.120:8020
    16/08/11 09:56:09 DEBUG Client: IPC Client (461299828) connection to ip-10-102-224-120.ec2.internal/10.102.224.120:8020 from hadoop: starting, having connections 2
    16/08/11 09:56:09 DEBUG Client: IPC Client (461299828) connection to ip-10-102-224-120.ec2.internal/10.102.224.120:8020 from hadoop sending #10767
    16/08/11 09:56:09 DEBUG Client: IPC Client (461299828) connection to ip-10-102-224-120.ec2.internal/10.102.224.120:8020 from hadoop got value #10767
    16/08/11 09:56:09 DEBUG ProtobufRpcEngine: Call: complete took 3ms
    16/08/11 09:56:09 TRACE ProtobufRpcEngine: 46: Response <- ip-10-102-224-120.ec2.internal/10.102.224.120:8020: complete {result: true}
    16/08/11 09:56:09 TRACE ProtobufRpcEngine: 46: Call -> ip-10-102-224-120.ec2.internal/10.102.224.120:8020: getFileInfo {src: "/var/log/spark/apps/application_1470897979038_0002"}
    16/08/11 09:56:09 DEBUG Client: IPC Client (461299828) connection to ip-10-102-224-120.ec2.internal/10.102.224.120:8020 from hadoop sending #10768
    16/08/11 09:56:09 DEBUG Client: IPC Client (461299828) connection to ip-10-102-224-120.ec2.internal/10.102.224.120:8020 from hadoop got value #10768
    16/08/11 09:56:09 DEBUG ProtobufRpcEngine: Call: getFileInfo took 1ms
    16/08/11 09:56:09 TRACE ProtobufRpcEngine: 46: Response <- ip-10-102-224-120.ec2.internal/10.102.224.120:8020: getFileInfo {}
    16/08/11 09:56:09 TRACE ProtobufRpcEngine: 46: Call -> ip-10-102-224-120.ec2.internal/10.102.224.120:8020: rename {src: "/var/log/spark/apps/application_1470897979038_0002.inprogress" dst: "/var/log/spark/apps/application_1470897979038_0002"}
    16/08/11 09:56:09 DEBUG Client: IPC Client (461299828) connection to ip-10-102-224-120.ec2.internal/10.102.224.120:8020 from hadoop sending #10769
    16/08/11 09:56:09 DEBUG Client: IPC Client (461299828) connection to ip-10-102-224-120.ec2.internal/10.102.224.120:8020 from hadoop got value #10769
    16/08/11 09:56:09 DEBUG ProtobufRpcEngine: Call: rename took 2ms
    16/08/11 09:56:09 TRACE ProtobufRpcEngine: 46: Response <- ip-10-102-224-120.ec2.internal/10.102.224.120:8020: rename {result: true}
    16/08/11 09:56:09 INFO YarnClientSchedulerBackend: Shutting down all executors
    16/08/11 09:56:09 INFO YarnClientSchedulerBackend: Interrupting monitor thread
    16/08/11 09:56:09 INFO YarnClientSchedulerBackend: Asking each executor to shut down
    16/08/11 09:56:09 DEBUG AbstractService: Service: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state STOPPED
    16/08/11 09:56:09 DEBUG Client: stopping client from cache: org.apache.hadoop.ipc.Client@7aa30390
    16/08/11 09:56:09 INFO YarnClientSchedulerBackend: Stopped
    16/08/11 09:56:09 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    16/08/11 09:56:09 INFO MemoryStore: MemoryStore cleared
    16/08/11 09:56:09 INFO BlockManager: BlockManager stopped
    16/08/11 09:56:09 INFO BlockManagerMaster: BlockManagerMaster stopped
    16/08/11 09:56:09 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    16/08/11 09:56:09 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
    16/08/11 09:56:09 INFO SparkContext: Successfully stopped SparkContext
    16/08/11 09:56:09 INFO ShutdownHookManager: Shutdown hook called
    16/08/11 09:56:09 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-fb5326d4-f089-4aff-b394-bc126f12a983/pyspark-f6c4c7f7-f6e5-4dcf-a9cd-cf03391413d9
    16/08/11 09:56:09 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
    16/08/11 09:56:09 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-fb5326d4-f089-4aff-b394-bc126f12a983
    16/08/11 09:56:09 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-fb5326d4-f089-4aff-b394-bc126f12a983/httpd-9104e927-bd6d-4338-bd20-fd01b7d3ce7f
    16/08/11 09:56:09 DEBUG Client: stopping client from cache: org.apache.hadoop.ipc.Client@7aa30390

1 个答案:

答案 0 :(得分:0)

您可能正在以纱线客户端模式运行程序,即驱动程序位于提交主机上。

查看您的日志文件,您会注意到客户端已关闭:

Invoking stop() from shutdown hook

最有可能由封闭shell调用,因为您的会话已终止。将作业发送到后台并不会阻止此操作,因为该过程仍然与其父级(即会话)绑定。

除此之外,您还可以使用控制台输出来收集结果,这是您不应该做的,特别是因为您已经在HDFS中收集了相同的记录:

lines.saveAsTextFiles('/tmp/')

我建议以下方法来解决这个问题:

a)以集群模式运行。将--deploy-mode cluster添加到您的参数中 b)如果您仍想收集输出,请像在火花提交前一样添加nohupnohup会将您的流程与父流程分开。