我正在尝试编写Spark Java自定义接收器,并且在接收器中我需要访问Cassandra数据库。我让Spark以群集模式运行,至少有2个工作人员。
这是我的 Java Custom Spark Receiver
@Service
public class MyCustomReceiver extends Receiver<MyData>{
private static final Log logger = LogFactory.getLog(MyCustomReceiver.class);
@Autowired
private MyAppConfig myAppConfig;
@Autowired
private CassandraDataService cassandraDataService;
public MyCustomReceiver() {
super(StorageLevel.MEMORY_AND_DISK_2());
logger.debug("Initiated...");
}
@Override
public void onStart() {
// Start the thread that receives data over a connection
logger.debug("Calling the receive method...");
receive();
logger.debug("Done.. calling the receive method...");
}
private void receive() {
logger.debug("receive method called...");
List<String> myConfigs = myAppConfig.getMyConfig();
logger.debug("Received myConfigs..." + myConfigs);
for(String myConfigStr : myConfigs)
{
ObjectMapper mapper = new ObjectMapper();
MyConfig myConfig;
try {
while (!isStopped()) {
myConfig = mapper.readValue(myConfigStr, MyConfig.class);
logger.debug("Parsed the myConfig..." + myConfig);
// Check for matching data in Cassandra
List<MyData> cassandraRows = cassandraDataService.getMatches(myConfig);
for(MyData myData : cassandraRows)
{
System.out.println("Received data '" + myData + "'");
}
store(cassandraRows.iterator());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
@Override
public void onStop() {
}
}
Spark应用程序/驱动程序 -
@SpringBootApplication
public class MySpringBootSparkApp {
private static final Log logger = LogFactory.getLog(MySpringBootSparkApp.class);
public static void main(String[] args) {
logger.debug("Initiated MySpringBootSparkApp...");
SpringApplication.run(MySpringBootSparkApp.class, args);
SparkConf sparkConf = new SparkConf().setAppName("Spark Processing Boot App");
JavaStreamingContext jsc = new JavaStreamingContext(sparkConf, new Duration(1000));
JavaReceiverInputDStream<MyData> myDataDStream = jsc.receiverStream(
new MyCustomReceiver());
myDataDStream.foreachRDD(myDataJavaRDD -> {
logger.debug("myDataJavaRDD = " + myDataJavaRDD);
myDataJavaRDD.foreach(myData -> {
System.out.println("myData = " + myData);
});
});
}
}
当我将具有上述应用程序的uber jar和所有依赖项提交到具有至少2个工作节点的集群时,我看到一个worker拾取了Driver程序并启动了Custom Receiver处理。日志不会显示是否还有其他事情发生 - 例如调用自定义接收器,从Cassandra获取数据或者数据是否返回到驱动程序。
这是Spark Conf目录中的log4j.properties -
log4j.rootCategory=DEBUG, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=DEBUG
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
我不知道我怎么能弄清楚上面的代码是怎么回事以及为什么MyData记录我希望由Receiver返回的内容永远不会打印在Spark主程序中我无法推断如果它完全归还。任何有关如何进行的指导将不胜感激。
由于
答案 0 :(得分:0)
我认为这一个...... 我没有在JavaStreamingContext上调用start()
jsc.start();
jsc.awaitTermination();
一旦我这样做,整个Java App开始给我提供我正在寻找的东西。干杯!