Question

我想通过Flink，Scala语言和addSource-以及readCsvFile函数来阅读csv文件。我还没有找到任何简单的例子。我只找到了https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/scala/com/dataartisans/flinktraining/exercises/datastream_scala/cep/LongRides.scala，这对我来说太复杂了。

在定义中：StreamExecutionEnvironment.addSource（sourceFunction）我应该只使用readCsvFile作为sourceFunction吗？

阅读后，我想使用CEP（复杂事件处理）。

Answer 1

readCsvFile（）仅作为Flink的DataSet（批处理）API的一部分提供，不能与DataStream（流）API一起使用。这是一个非常好的example of readCsvFile()，虽然它可能与您尝试做的事情无关。

readTextFile（）和readFile（）是StreamExecutionEnvironment上的方法，并且没有实现SourceFunction接口 - 它们不应该与addSource（）一起使用，而是用于代替它。这是使用DataStream API加载CSV的example of using readTextFile()。

另一种选择是使用Table API和CsvTableSource。这是an example and some discussion of what it does and doesn't do。如果你走这条路线，在使用CEP之前，你需要使用StreamTableEnvironment.toAppendStream（）将你的表流转换为DataStream。

请记住，所有这些方法都只会读取一次文件并从其内容创建一个有界流。如果您想要一个读取无界CSV流的源，并等待追加新行，那么您需要采用不同的方法。您可以使用自定义源，或socketTextStream，或类似Kafka。

Answer 2

import org.apache.flink.types.Row
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.table.api.TableEnvironment
import org.apache.flink.table.sources.CsvTableSource
import org.apache.flink.api.common.typeinfo.Types

object CepTest2 {

  def main(args: Array[String]) {

    println("Start ...")

    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

    //val tableEnv = StreamTableEnvironment.getTableEnvironment(env)
    val tableEnv = TableEnvironment.getTableEnvironment(env)

    val csvtable = CsvTableSource
      .builder
      .path("/home/esa/Log_EX1_gen_track_5.csv")
      .ignoreFirstLine
      .fieldDelimiter(",")
      .field("time", Types.INT)
      .field("id", Types.STRING)
      .field("sources", Types.STRING)
      .field("targets", Types.STRING)
      .field("attr", Types.STRING)
      .field("data", Types.STRING)
      .build

    tableEnv.registerTableSource("test", csvtable)

    val tableTest = tableEnv.scan("test").where("id='5'").select("id,sources,targets")

    val stream = tableEnv.toAppendStream[Row](tableTest)

    stream.print
    env.execute()
  }
}


Error:(56, 46) could not find implicit value for evidence parameter of type org.apache.flink.api.common.typeinfo.TypeInformation[org.apache.flink.types.Row]
    val stream = tableEnv.toAppendStream[Row](tableTest)

Error:(56, 46) not enough arguments for method toAppendStream: (implicit evidence$3: org.apache.flink.api.common.typeinfo.TypeInformation[org.apache.flink.types.Row])org.apache.flink.streaming.api.scala.DataStream[org.apache.flink.types.Row].
Unspecified value parameter evidence$3.
    val stream = tableEnv.toAppendStream[Row](tableTest)

我试图长时间解决上面的错误，但还没有成功。你能给出一些如何解决它的提示吗？

Answer 3

如果您有一个包含 3 个字段的 CSV 文件 - String、Long、Integer

然后在下面做

val input=benv.readCsvFile[(String,Long,Integer)]("hdfs:///path/to/your_csv_file")

PS:-我正在使用 flink shell 这就是为什么我有 benv

通过Flink，scala，addSource和readCsvFile读取csv文件

3 个答案: