如何在Scala中使用Flink的KafkaSource?

时间:2015-07-15 00:07:04

标签: scala apache-kafka apache-flink

我正在尝试使用Flink的KafkaSource运行一个简单的测试程序。我正在使用以下内容:

  • Flink 0.9
  • Scala 2.10.4
  • Kafka 0.8.2.1

我按照文档测试了KafkaSource(添加了依赖项,将插件中的Kafka连接器flink-connector-kafka捆绑在一起),如herehere所述。

以下是我的简单测试程序:

import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.connectors.kafka

object TestKafka {
  def main(args: Array[String]) {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val stream = env
     .addSource(new KafkaSource[String]("localhost:2181", "test", new SimpleStringSchema))
     .print
  }
}

但是,编译总是抱怨找不到KafkaSource:

[ERROR] TestKafka.scala:8: error: not found: type KafkaSource
[ERROR]     .addSource(new KafkaSource[String]("localhost:2181", "test", new SimpleStringSchema))

我在这里想念什么?

3 个答案:

答案 0 :(得分:3)

我是某位用户,因此我使用了以下build.sbt

organization := "pl.japila.kafka"
scalaVersion := "2.11.7"

libraryDependencies += "org.apache.flink" % "flink-connector-kafka" % "0.9.0" exclude("org.apache.kafka", "kafka_${scala.binary.version}")
libraryDependencies += "org.apache.kafka" %% "kafka" % "0.8.2.1"

允许我运行程序:

import org.apache.flink.streaming.api.environment._
import org.apache.flink.streaming.connectors.kafka
import org.apache.flink.streaming.connectors.kafka.api._
import org.apache.flink.streaming.util.serialization._

object TestKafka {
  def main(args: Array[String]) {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val stream = env
     .addSource(new KafkaSource[String]("localhost:2181", "test", new SimpleStringSchema))
     .print
  }
}

输出:

[kafka-flink]> run
[info] Running TestKafka
log4j:WARN No appenders could be found for logger (org.apache.flink.streaming.api.graph.StreamGraph).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[success] Total time: 0 s, completed Jul 15, 2015 9:29:31 AM

答案 1 :(得分:1)

问题似乎是SBT和Maven配置文件不能很好地协同工作。

Flink POM将Scala版本(2.10,2.11,...)称为变量,其中一些在构建配置文件中定义。未通过SBT正确评估配置文件,因此包装无法正常工作。

有一个问题和待处理的拉取请求可以解决此问题:https://issues.apache.org/jira/browse/FLINK-2408

答案 2 :(得分:0)

object FlinkKafkaStreaming {
    def main(args: Array[String]) {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val properties = new Properties()
    properties.setProperty("bootstrap.servers", "localhost:9092")
   // only required for Kafka 0.8
   properties.setProperty("zookeeper.connect", "localhost:2181")
   properties.setProperty("group.id", "flink-kafka")
   val stream = env.addSource(new FlinkKafkaConsumer08[String] 
  ("your_topic_name",new SimpleStringSchema(), properties))   
  stream.setParallelism(1).writeAsText("your_local_dir_path")
  env.execute("XDFlinkKafkaStreaming")
  }
}

要进行测试,您可以执行以下操作:

  1. 首先运行flink演示;
  2. 运行Kafka_Proudcer;