如何从RabbitMQ消息队列中批量接收Spark Stream中的数据

时间:2017-07-03 19:14:33

标签: java apache-spark rabbitmq streaming

我是Spark新手,我正在开发一个简单的Spark Streaming应用程序,该应用程序使Excel Data拥有大约300行,并连续将每行放入RabbitMQ队列(每2秒)作为序列化对象。另一方面 - 我需要从队列中提取数据,反序列化接收到的对象并对该对象进行一些操作。但我还需要并行执行此操作 - 所以我想到使用Spark Streaming,它对批量序列化的反序列化对象执行操作。

Main.java

      String Row="";
      String Col[];
       String Parameters="";
       QueueConsumer consumer = new QueueConsumer("queue");
       Thread consumerThread = new Thread(consumer);
      consumerThread.start();
       Producer producer = new Producer("queue");

     try {

    BufferedReader buffer = new BufferedReader(new FileReader("somedata.csv"));  

    while(true) {
        Row = buffer.readLine();
        if(Row == null){
            break; // Stop at the End of the File or Last line
        } else {
            Col = Row.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"); 
            for(int i=0;i<Col.length;i++) {
                Parameters += Col[i]+",";
            }
            if (Parameters.endsWith(",")) {
                Parameters = Parameters.substring(0, Parameters.length() - 1);
            }
            // Add Data from CSV to Object
            DataBuilder d = DataBuilder.addData(Parameters);
            Parameters="";
            producer.sendMessage(d);
            // Wait Until Object is Received ....
            Thread.sleep(666);
            //d=null;

        }
    }
    // Close the Buffer
    buffer.close();
 }catch(Exception e) {
    System.out.println("Error reading the File !");
    e.printStackTrace();
 }

Producer.java

它具有序列化对象并将它们放入队列

的代码

Consumer.java

  /**
    * Called when consumer is registered.
   */
    public void handleConsumeOk(String consumerTag) {
       System.out.println("Consumer "+consumerTag +" registered");      
    }

   /**
     * Called when new message is available.
    */
public void handleDelivery(String consumerTag, Envelope env,BasicProperties 
props, byte[] body) throws IOException {
try {
SparkConf conf = new 
SparkConf().setAppName("SparkStreaming").setMaster("local[*]");
JavaStreamingContext jssc = new 
JavaStreamingContext(conf,Durations.seconds(2));
 JavaReceiverInputDStream<Object> ReceiverStream = 
RabbitMQUtils.createStreamFromAQueue(jssc, "localhost", 5672, "queue", 
StorageLevel.MEMORY_AND_DISK_SER_2)

//在此处添加的代码

    // Removing data from Queue
        try{
            d = (DataBuilder)SerializationUtils.deserialize(body);
            System.out.println(d.getDate().toString());
        }catch(Exception e){
            System.out.println(e);
        }

// Start the computation
try {
    jssc.start(); 
    jssc.awaitTermination();    
}
catch(Exception e){
    System.out.println(e);
}

}

我的简单问题是如何使用Spark Streaming和RabbitMQ批量接收数据 谢谢你的帮助:)

0 个答案:

没有答案