我是Spark Streaming概念的新手,并且在过去的两天里一直试图理解来自socket的Spark流。我看到Spark能够读取传递给套接字的块。但是,它不会对读取块执行任何操作。
这是Spark代码
package foo;
import java.io.File;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import scala.Tuple2;
public class AppSocket {
public static void main(String[] args)
{
SparkConf conf = new SparkConf().setAppName("KAFKA").setMaster("local");
JavaStreamingContext jssc = new JavaStreamingContext(conf, new org.apache.spark.streaming.Duration(1000));
JavaReceiverInputDStream<String> inputStream = jssc.socketTextStream("localhost", 33333);
JavaPairDStream<String, Integer> mappedStream = inputStream.mapToPair(
new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String i) {
System.out.println(i);
return new Tuple2<String, Integer>(i , 1);
}
});
JavaPairDStream<String, Integer> reducedStream = mappedStream.reduceByKey(
new Function2<Integer, Integer, Integer>() {
public Integer call(Integer i1, Integer i2) {
return i1 + i2;
}
});
reducedStream.print();
System.out.println("Testing........"+reducedStream.count());
jssc.start();
jssc.awaitTermination();
}
}
我正在运行netcat以在指定端口上创建输出流
nc -lk 33333
我试图创建输出流。这是我的java代码
ServerSocket serverSocket = null;
int portNumber = 33333;
serverSocket = new ServerSocket(portNumber);
System.out.println("Server Waiting.................");
Socket clientSocket = serverSocket.accept();
System.out.println("Server Connected!!!!!!!!!!!!!!!");
// Wait for a message
int countflag = 0;
PrintWriter out = null;
out = new PrintWriter(clientSocket.getOutputStream(), true);
while(true)
{
Message message = consumer.receive(1000);
if (message instanceof TextMessage) {
TextMessage textMessage = (TextMessage) message;
String text = textMessage.getText();
System.out.println("Received: " + text);
list.add(text);
System.out.println(++countflag);
if(list.size() > 50)
{
for(int i = 0; i < list.size() ; i++)
{
System.out.print(i);
out.write(text);
out.write("\n");
out.flush();
}
list.clear();
}
} else {
count++;
}
if(count > 100) break;
}
out.close();
consumer.close();
session.close();
connection.close();
Spark会消耗流上发送的块,但它不会对流式块执行任何操作。
Spark输出控制台
14/11/26 15:32:14 INFO MemoryStore: ensureFreeSpace(12) called with curMem=3521, maxMem=278302556
14/11/26 15:32:14 INFO MemoryStore: Block input-0-1417015934400 stored as bytes in memory (estimated size 12.0 B, free 265.4 MB)
14/11/26 15:32:14 INFO BlockManagerInfo: Added input-0-1417015934400 in memory on ip-10-0-1-56.ec2.internal:57275 (size: 12.0 B, free: 265.4 MB)
14/11/26 15:32:14 INFO BlockManagerMaster: Updated info of block input-0-1417015934400
14/11/26 15:32:14 WARN BlockManager: Block input-0-1417015934400 already exists on this machine; not re-adding it
14/11/26 15:32:14 INFO BlockGenerator: Pushed block input-0-1417015934400
14/11/26 15:32:15 INFO ReceiverTracker: Stream 0 received 1 blocks
14/11/26 15:32:15 INFO JobScheduler: Added jobs for time 1417015935000 ms
感谢您的帮助。提前致谢
答案 0 :(得分:4)
将master设置为&#34; local [n]&#34;用n> 1。接收器需要一个任务槽来运行并使用&#34; local&#34;如果只有一个任务槽。因此接收器在该插槽中运行,不留任何任务槽来处理数据。
我建议阅读&#34;要记住的要点&#34;在我的编程指南的下一节中。 http://spark.apache.org/docs/latest/streaming-programming-guide.html#input-dstreams