打印火花流不工作

时间:2017-10-04 15:45:08

标签: apache-spark spark-streaming

我编写了一个简单的火花流接收器,但我在处理流时遇到了麻烦......收到的数据却没有通过火花流处理。

public class JavaCustomReceiver extends Receiver<String> {
    private static final Pattern SPACE = Pattern.compile(" ");

    public static void main(String[] args) throws Exception {
//    if (args.length < 2) {
//      System.err.println("Usage: JavaCustomReceiver <hostname> <port>");
//      System.exit(1);
//    }

//    StreamingExamples.setStreamingLogLevels();
LogManager.getRootLogger().setLevel(Level.WARN);
Log log = LogFactory.getLog("EXECUTOR-LOG:");
    // Create the context with a 1 second batch size
    SparkConf sparkConf = new SparkConf().setAppName("JavaCustomReceiver").setMaster("local[4]");
    JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(10000));

    // Create an input stream with the custom receiver on target ip:port and count the
    // words in input stream of \n delimited text (eg. generated by 'nc')
    JavaReceiverInputDStream<String> lines = ssc.receiverStream(
      new JavaCustomReceiver("localhost", 9999));    

//    JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(""))).iterator();
    JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(" ")).iterator());
    words.foreachRDD(   x-> {
        x.collect().stream().forEach(n-> System.out.println("item of list: "+n));
    });
    words.foreachRDD( rdd -> {
        if (!rdd.isEmpty()) System.out.println("its empty"); });



    JavaPairDStream<String, Integer> wordCounts = words.mapToPair(s -> new Tuple2<>(s, 1))
        .reduceByKey((i1, i2) -> i1 + i2);
    wordCounts.count();
    System.out.println("WordCounts == " + wordCounts);
    wordCounts.print();
   log.warn("This is a test message");
   log.warn(wordCounts.count());

    ssc.start();
    ssc.awaitTermination();
  }

    // ============= Receiver code that receives data over a socket
    // ==============

    String host = null;
    int port = -1;

    public JavaCustomReceiver(String host_, int port_) {
        super(StorageLevel.MEMORY_AND_DISK_2());
        host = host_;
        port = port_;
    }

    @Override
    public void onStart() {
        // Start the thread that receives data over a connection
        new Thread(this::receive).start();
    }

    @Override
    public void onStop() {
        // There is nothing much to do as the thread calling receive()
        // is designed to stop by itself isStopped() returns false
    }

    /** Create a socket connection and receive data until receiver is stopped */
    private void receive() {
        try {
            Socket socket = null;
            BufferedReader reader = null;
            try {
                // connect to the server
                socket = new Socket(host, port);
                reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), StandardCharsets.UTF_8));
                // Until stopped or connection broken continue reading
                String userInput;
                while (!isStopped() && (userInput = reader.readLine()) != null) {
                    System.out.println("Received data '" + userInput + "'");
                    store(userInput);
                }
            } finally {
                Closeables.close(reader, /* swallowIOException = */ true);
                Closeables.close(socket, /* swallowIOException = */ true);
            }
            // Restart in an attempt to connect again when server is active
            // again
            restart("Trying to connect again");
        } catch (ConnectException ce) {
            // restart if could not connect to server
            restart("Could not connect", ce);
        } catch (Throwable t) {
            restart("Error receiving data", t);
        }
    }
}

这是我的日志 - 您可以看到显示的测试数据但在此之后我根本看不到结果的内容...

我已将主人设置为本地[2] /本地[4],但没有任何作用。

Received data 'testdata'
17/10/04 11:43:14 INFO MemoryStore: Block input-0-1507131793800 stored as values in memory (estimated size 80.0 B, free 912.1 MB)
17/10/04 11:43:14 INFO BlockManagerInfo: Added input-0-1507131793800 in memory on 10.99.1.116:50088 (size: 80.0 B, free: 912.2 MB)
17/10/04 11:43:14 WARN BlockManager: Block input-0-1507131793800 replicated to only 0 peer(s) instead of 1 peers
17/10/04 11:43:14 INFO BlockGenerator: Pushed block input-0-1507131793800
17/10/04 11:43:20 INFO JobScheduler: Added jobs for time 1507131800000 ms
17/10/04 11:43:20 INFO JobScheduler: Starting job streaming job 1507131800000 ms.0 from job set of time 1507131800000 ms
17/10/04 11:43:20 INFO SparkContext: Starting job: collect at JavaCustomReceiver.java:61
17/10/04 11:43:20 INFO DAGScheduler: Got job 44 (collect at JavaCustomReceiver.java:61) with 1 output partitions
17/10/04 11:43:20 INFO DAGScheduler: Final stage: ResultStage 59 (collect at JavaCustomReceiver.java:61)
17/10/04 11:43:20 INFO DAGScheduler: Parents of final stage: List()
17/10/04 11:43:20 INFO DAGScheduler: Missing parents: List()
17/10/04 11:43:20 INFO DAGScheduler: Submitting ResultStage 59 (MapPartitionsRDD[58] at flatMap at JavaCustomReceiver.java:59), which has no missing parents
17/10/04 11:43:20 INFO MemoryStore: Block broadcast_32 stored as values in memory (estimated size 2.7 KB, free 912.1 MB)
17/10/04 11:43:20 INFO MemoryStore: Block broadcast_32_piece0 stored as bytes in memory (estimated size 1735.0 B, free 912.1 MB)
17/10/04 11:43:20 INFO BlockManagerInfo: Added broadcast_32_piece0 in memory on 10.99.1.116:50088 (size: 1735.0 B, free: 912.2 MB)
17/10/04 11:43:20 INFO SparkContext: Created broadcast 32 from broadcast at DAGScheduler.scala:1012
17/10/04 11:43:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 59 (MapPartitionsRDD[58] at flatMap at JavaCustomReceiver.java:59)
17/10/04 11:43:20 INFO TaskSchedulerImpl: Adding task set 59.0 with 1 tasks
17/10/04 11:43:20 INFO TaskSetManager: Starting task 0.0 in stage 59.0 (TID 60, localhost, partition 0, ANY, 5681 bytes)
17/10/04 11:43:20 INFO Executor: Running task 0.0 in stage 59.0 (TID 60)

1 个答案:

答案 0 :(得分:0)

找到答案。

Why doesn't the Flink SocketTextStreamWordCount work?

我更改了程序以将输出保存在文本文件中,并且它可以完美地保存流数据。

谢谢 阿迪