我编写了一个简单的火花流接收器,但我在处理流时遇到了麻烦......收到的数据却没有通过火花流处理。
public class JavaCustomReceiver extends Receiver<String> {
private static final Pattern SPACE = Pattern.compile(" ");
public static void main(String[] args) throws Exception {
// if (args.length < 2) {
// System.err.println("Usage: JavaCustomReceiver <hostname> <port>");
// System.exit(1);
// }
// StreamingExamples.setStreamingLogLevels();
LogManager.getRootLogger().setLevel(Level.WARN);
Log log = LogFactory.getLog("EXECUTOR-LOG:");
// Create the context with a 1 second batch size
SparkConf sparkConf = new SparkConf().setAppName("JavaCustomReceiver").setMaster("local[4]");
JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(10000));
// Create an input stream with the custom receiver on target ip:port and count the
// words in input stream of \n delimited text (eg. generated by 'nc')
JavaReceiverInputDStream<String> lines = ssc.receiverStream(
new JavaCustomReceiver("localhost", 9999));
// JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(""))).iterator();
JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(SPACE.split(" ")).iterator());
words.foreachRDD( x-> {
x.collect().stream().forEach(n-> System.out.println("item of list: "+n));
});
words.foreachRDD( rdd -> {
if (!rdd.isEmpty()) System.out.println("its empty"); });
JavaPairDStream<String, Integer> wordCounts = words.mapToPair(s -> new Tuple2<>(s, 1))
.reduceByKey((i1, i2) -> i1 + i2);
wordCounts.count();
System.out.println("WordCounts == " + wordCounts);
wordCounts.print();
log.warn("This is a test message");
log.warn(wordCounts.count());
ssc.start();
ssc.awaitTermination();
}
// ============= Receiver code that receives data over a socket
// ==============
String host = null;
int port = -1;
public JavaCustomReceiver(String host_, int port_) {
super(StorageLevel.MEMORY_AND_DISK_2());
host = host_;
port = port_;
}
@Override
public void onStart() {
// Start the thread that receives data over a connection
new Thread(this::receive).start();
}
@Override
public void onStop() {
// There is nothing much to do as the thread calling receive()
// is designed to stop by itself isStopped() returns false
}
/** Create a socket connection and receive data until receiver is stopped */
private void receive() {
try {
Socket socket = null;
BufferedReader reader = null;
try {
// connect to the server
socket = new Socket(host, port);
reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), StandardCharsets.UTF_8));
// Until stopped or connection broken continue reading
String userInput;
while (!isStopped() && (userInput = reader.readLine()) != null) {
System.out.println("Received data '" + userInput + "'");
store(userInput);
}
} finally {
Closeables.close(reader, /* swallowIOException = */ true);
Closeables.close(socket, /* swallowIOException = */ true);
}
// Restart in an attempt to connect again when server is active
// again
restart("Trying to connect again");
} catch (ConnectException ce) {
// restart if could not connect to server
restart("Could not connect", ce);
} catch (Throwable t) {
restart("Error receiving data", t);
}
}
}
这是我的日志 - 您可以看到显示的测试数据但在此之后我根本看不到结果的内容...
我已将主人设置为本地[2] /本地[4],但没有任何作用。
Received data 'testdata'
17/10/04 11:43:14 INFO MemoryStore: Block input-0-1507131793800 stored as values in memory (estimated size 80.0 B, free 912.1 MB)
17/10/04 11:43:14 INFO BlockManagerInfo: Added input-0-1507131793800 in memory on 10.99.1.116:50088 (size: 80.0 B, free: 912.2 MB)
17/10/04 11:43:14 WARN BlockManager: Block input-0-1507131793800 replicated to only 0 peer(s) instead of 1 peers
17/10/04 11:43:14 INFO BlockGenerator: Pushed block input-0-1507131793800
17/10/04 11:43:20 INFO JobScheduler: Added jobs for time 1507131800000 ms
17/10/04 11:43:20 INFO JobScheduler: Starting job streaming job 1507131800000 ms.0 from job set of time 1507131800000 ms
17/10/04 11:43:20 INFO SparkContext: Starting job: collect at JavaCustomReceiver.java:61
17/10/04 11:43:20 INFO DAGScheduler: Got job 44 (collect at JavaCustomReceiver.java:61) with 1 output partitions
17/10/04 11:43:20 INFO DAGScheduler: Final stage: ResultStage 59 (collect at JavaCustomReceiver.java:61)
17/10/04 11:43:20 INFO DAGScheduler: Parents of final stage: List()
17/10/04 11:43:20 INFO DAGScheduler: Missing parents: List()
17/10/04 11:43:20 INFO DAGScheduler: Submitting ResultStage 59 (MapPartitionsRDD[58] at flatMap at JavaCustomReceiver.java:59), which has no missing parents
17/10/04 11:43:20 INFO MemoryStore: Block broadcast_32 stored as values in memory (estimated size 2.7 KB, free 912.1 MB)
17/10/04 11:43:20 INFO MemoryStore: Block broadcast_32_piece0 stored as bytes in memory (estimated size 1735.0 B, free 912.1 MB)
17/10/04 11:43:20 INFO BlockManagerInfo: Added broadcast_32_piece0 in memory on 10.99.1.116:50088 (size: 1735.0 B, free: 912.2 MB)
17/10/04 11:43:20 INFO SparkContext: Created broadcast 32 from broadcast at DAGScheduler.scala:1012
17/10/04 11:43:20 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 59 (MapPartitionsRDD[58] at flatMap at JavaCustomReceiver.java:59)
17/10/04 11:43:20 INFO TaskSchedulerImpl: Adding task set 59.0 with 1 tasks
17/10/04 11:43:20 INFO TaskSetManager: Starting task 0.0 in stage 59.0 (TID 60, localhost, partition 0, ANY, 5681 bytes)
17/10/04 11:43:20 INFO Executor: Running task 0.0 in stage 59.0 (TID 60)
答案 0 :(得分:0)