JavaStreamingContext.union效果不理想

时间:2019-03-15 10:18:27

标签: apache-spark apache-kafka spark-streaming

我正在尝试使用JavaPairDStream方法在JavaPairDStream中合并两个JavaStreamingContext.union,但没有效果。

以下是示例代码:

List<JavaPairDStream<String, String>> kafkaStreams = new ArrayList<JavaPairDStream<String, String>>(topicArrLen);

for (int i = 0; i < topicArrLen; i++)
{
    String num = topicArr[i];
    String zk = pubTool.get("kafka.zk.address" + num);
    String topic = pubTool.get("kafka.original.topic" + num);
    String groupId = pubTool.get("kafka.original.groupid" + num);
    Integer partitions = Integer.valueOf(pubTool.get("kafka.original.partitions" + num));
    Preconditions.checkArgument(StringUtils.isNotBlank(zk) 
        && StringUtils.isNotBlank(groupId)
        && StringUtils.isNotBlank(topic) 
        && partitions > 0,
        "kafka params zk|groupid|topic|partitions is illegal");
    Map<String, Integer> topics = Maps.newHashMap();
    topics.put(topic, partitions);
    kafkaStreams.add(KafkaUtils.createStream(jssc, zk, groupId, topics));
}

JavaPairDStream<String, String> kafkaStream = null;
final int topicType;
if(1 == kafkaStreams.size())
{
    kafkaStream = kafkaStreams.get(0);
    topicType = Integer.valueOf(topicArr[0]);
}
else
{
    kafkaStream = jssc.union(kafkaStreams.get(0), kafkaStreams.subList(1, kafkaStreams.size()));
    topicType = 0;
}

JavaDStream<DataEntity> entities = 
kafkaStream.filter((Function<Tuple2<String, String>, Boolean>) arg -> {
    return true; }).map((Function<Tuple2<String, String>, DataEntity>) arg -> {
        DataEntity dataEntity = new DataEntity();
        ...
        return dataEntity;
    }
);

CleanLogService.parse(entities);
jssc.start();
jssc.awaitTermination();

0 个答案:

没有答案