错误集成Spark Sreaming和flume avro代理

时间:2016-09-27 06:01:42

标签: apache-spark streaming avro flume

我想将Flume avro Agent与spark Streaming 1.6.0(HDP 2.4.0.0)连接起来。 Flume代理程序配置文件正常工作:

# Please paste flume.conf here. Example:
# Sources, channels, and sinks are defined per
# agent name, in this case 'clickstream'.

##################################
###declaramos los source channels y sinks
###################################
clickstream.sources  = source1
clickstream.channels = channel1
clickstream.sinks  = sink1

##################################
#SOURCES: propiedades de los sources
###################################
clickstream.sources.source1.type = spooldir
#The directory from which to read files from.
clickstream.sources.source1.spoolDir = /tmp/flume
#clickstream.sources.source1.command = tail -F /rsiiri/syspri/bdpiba/bdp00340/
clickstream.sources.source1.batchSize = 100
clickstream.sources.source1.channels = channel1
#When to delete completed files: never or immediate
clickstream.sources.source1.deletePolicy = never
clickstream.sources.source1.consumeOrder = youngest
clickstream.sources.source1.inputCharset = UTF-8
clickstream.sources.source1.decodeErrorPolicy = REPLACE
clickstream.sources.source1.deserializer = LINE
clickstream.sources.source1.deserializer.maxLineLength = 2048
#Whether to add a header storing the absolute path filename.
clickstream.sources.source1.fileHeader = false
#Suffix to append to completely ingested files
clickstream.sources.source1.fileSuffix = .COMPLETED


clickstream.channels.channel1.type = memory
clickstream.channels.channel1.capacity = 100
clickstream.channels.channel1.transactionCapacity = 100

##################################
### SINK HDFS: para escribir en el HDFS como sink###
##################################
clickstream.sinks.sink1.type = avro
clickstream.sinks.sink1.channel = channel1
clickstream.sinks.sink1.hostname = HOSTNAMEA
clickstream.sinks.sink1.port = 3333

对于Spark Streaming python代码是这样的:

from pyspark import SparkContext
from pyspark.streaming import StreamingContext

ssc = StreamingContext(sc, 5)

lines = ssc.socketTextStream("HOSTNAMEA", 3333)

lines.pprint()

ssc.start()            
ssc.awaitTermination()  

The error that i can see from spark stremaming is:
16/09/27 07:51:30 INFO JobScheduler: Finished job streaming job 1474955490000 ms.0 from job set of time 1474955490000 ms
16/09/27 07:51:30 INFO BlockRDD: Removing RDD 100 from persistence list
16/09/27 07:51:30 INFO JobScheduler: Total delay: 0.011 s for time 1474955490000 ms (execution: 0.010 s)
16/09/27 07:51:30 INFO BlockManager: Removing RDD 100
16/09/27 07:51:30 INFO SocketInputDStream: Removing blocks of RDD BlockRDD[100] at socketTextStream at NativeMethodAccessorImpl.java:-2 of time 1474955490000 ms
16/09/27 07:51:30 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer(1474955480000 ms)
16/09/27 07:51:30 INFO InputInfoTracker: remove old batch metadata: 1474955480000 ms
16/09/27 07:51:31 INFO ReceiverTracker: Registered receiver for stream 0 from xxxxxxxxx:40483
16/09/27 07:51:31 ERROR ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error connecting to HOSTNAMEA:3333 - java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at java.net.Socket.connect(Socket.java:538)
        at java.net.Socket.<init>(Socket.java:434)
        at java.net.Socket.<init>(Socket.java:211)
        at org.apache.spark.streaming.dstream.SocketReceiver.receive(SocketInputDStream.scala:73)
        at org.apache.spark.streaming.dstream.SocketReceiver$$anon$2.run(SocketInputDStream.scala:59)

16/09/27 07:51:33 INFO ReceiverTracker: Registered receiver for stream 0 from xxxxxxx:40483
16/09/27 07:51:33 ERROR ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error connecting to HOSTNAMEA:3333 - java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at java.net.Socket.connect(Socket.java:538)
        at java.net.Socket.<init>(Socket.java:434)
        at java.net.Socket.<init>(Socket.java:211)
        at org.apache.spark.streaming.dstream.SocketReceiver.receive(SocketInputDStream.scala:73)
        at org.apache.spark.streaming.dstream.SocketReceiver$$anon$2.run(SocketInputDStream.scala:59)

此外,我尝试使用此代码,但我有一个不同的错误

从pyspark.streaming.flume导入FlumeUtils         来自pyspark导入SparkContext         来自pyspark.streaming import StreamingContext

    ssc = StreamingContext(sc, 5)
    flumeStream = FlumeUtils.createStream(ssc, "HOSTNAMEA", 3333)

    flumeStream.pprint()

    ssc.start()            
    ssc.awaitTermination()

wit the second code i have diferent error using this jar: 
spark-streaming-flume-assembly_2.11-1.6.0.jar
16/09/27 08:06:23 WARN TaskSetManager: Lost task 0.0 in stage 47.0 (TID 160, XXXXXXXXX): org.jboss.netty.channel.ChannelException: Failed to bind to: HOSTNAMEA/HOSTNAMEA:3333
        at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
        at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:106)
        at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:119)
        at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:74)
        at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:68)
        at org.apache.spark.streaming.flume.FlumeReceiver.initServer(FlumeInputDStream.scala:162)
        at org.apache.spark.streaming.flume.FlumeReceiver.onStart(FlumeInputDStream.scala:169)
        at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
        at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
        at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
        at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
        at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)
        at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.BindException: Cannot assign requested address
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:437)
        at sun.nio.ch.Net.bind(Net.java:429)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
        at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
        ... 3 more

0 个答案:

没有答案