我是Spark新手,我正在开发一个简单的Spark Streaming应用程序,该应用程序使Excel Data拥有大约300行,并连续将每行放入RabbitMQ队列(每2秒)作为序列化对象。另一方面 - 我需要从队列中提取数据,反序列化接收到的对象并对该对象进行一些操作。但我还需要并行执行此操作 - 所以我想到使用Spark Streaming,它对批量序列化的反序列化对象执行操作。
Main.java
String Row="";
String Col[];
String Parameters="";
QueueConsumer consumer = new QueueConsumer("queue");
Thread consumerThread = new Thread(consumer);
consumerThread.start();
Producer producer = new Producer("queue");
try {
BufferedReader buffer = new BufferedReader(new FileReader("somedata.csv"));
while(true) {
Row = buffer.readLine();
if(Row == null){
break; // Stop at the End of the File or Last line
} else {
Col = Row.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
for(int i=0;i<Col.length;i++) {
Parameters += Col[i]+",";
}
if (Parameters.endsWith(",")) {
Parameters = Parameters.substring(0, Parameters.length() - 1);
}
// Add Data from CSV to Object
DataBuilder d = DataBuilder.addData(Parameters);
Parameters="";
producer.sendMessage(d);
// Wait Until Object is Received ....
Thread.sleep(666);
//d=null;
}
}
// Close the Buffer
buffer.close();
}catch(Exception e) {
System.out.println("Error reading the File !");
e.printStackTrace();
}
Producer.java
它具有序列化对象并将它们放入队列
的代码Consumer.java
/**
* Called when consumer is registered.
*/
public void handleConsumeOk(String consumerTag) {
System.out.println("Consumer "+consumerTag +" registered");
}
/**
* Called when new message is available.
*/
public void handleDelivery(String consumerTag, Envelope env,BasicProperties
props, byte[] body) throws IOException {
try {
SparkConf conf = new
SparkConf().setAppName("SparkStreaming").setMaster("local[*]");
JavaStreamingContext jssc = new
JavaStreamingContext(conf,Durations.seconds(2));
JavaReceiverInputDStream<Object> ReceiverStream =
RabbitMQUtils.createStreamFromAQueue(jssc, "localhost", 5672, "queue",
StorageLevel.MEMORY_AND_DISK_SER_2)
//在此处添加的代码
// Removing data from Queue
try{
d = (DataBuilder)SerializationUtils.deserialize(body);
System.out.println(d.getDate().toString());
}catch(Exception e){
System.out.println(e);
}
// Start the computation
try {
jssc.start();
jssc.awaitTermination();
}
catch(Exception e){
System.out.println(e);
}
}
我的简单问题是如何使用Spark Streaming和RabbitMQ批量接收数据 谢谢你的帮助:)