我是Spark编程的新手。我有一个火花流程序,它需要将收到的DStream存储到数据库中。我想迭代我的Dstream并将每条记录存储到数据库中。
像这样的东西。JavaStreamingContext streamingContext = getSparkStreamingContext();
JavaReceiverInputDStream<String> socketTextStream = streamingContext
.socketTextStream("localhost", 8080);
DStream<String> dstream = socketTextStream.dstream();
// Iterate each record from the DStream and push it to DB
方法2:
是正确的做法吗?这种方法会带来任何性能提升/问题吗?
socketTextStream.foreachRDD(new Function<JavaRDD<String>, Void>() {
@Override
public Void call(JavaRDD<String> rdd) throws Exception {
List<String> collect = rdd.collect();
for (String string : collect) {
System.out.println(string);
}
return null;
}
});
答案 0 :(得分:2)
您可以使用JavaDStream.foreachRDD
和JavaRDD.foreach
:
JavaStreamingContext streamingContext = getSparkStreamingContext();
JavaReceiverInputDStream<String> socketTextStream = streamingContext
.socketTextStream("localhost", 8080);
socketTextStream.foreachRDD(new VoidFunction<JavaRDD<String>>() {
@Override
public void call(JavaRDD<String> rdd) throws Exception {
rdd.foreach(new VoidFunction<String>() {
@Override
public void call(String s) throws Exception {
// Save data
}
});
}
});
或使用Java 8 Lambda Expressions:
JavaStreamingContext streamingContext = getSparkStreamingContext();
JavaReceiverInputDStream<String> socketTextStream = streamingContext
.socketTextStream("localhost", 8080);
socketTextStream.foreachRDD((VoidFunction<JavaRDD<String>>) rdd -> {
rdd.foreach((VoidFunction<String>) s -> {
// Save data
});
});
由于您使用的是Spark 1.2.0(有点旧,我建议升级(目前最新版本是1.6.1,截至2016年5月22日)):
socketTextStream.foreachRDD(new Function<JavaRDD<String>, Void>() {
@Override
public Void call(JavaRDD<String> rdd) throws Exception {
rdd.foreach(new VoidFunction<String>() {
@Override
public void call(String s) throws Exception {
// Save data
}
});
return null;
}
});