我正在尝试将文件从azure blob存储流式传输到Cassandra,但是Spark没有识别应用程序/八位字节流类型。我试过这些代码:
val dstream: DStream[String] = ssc
.fileStream[LongWritable, Text, TextInputFormat](filePath, (path: Path) => path.getName().endsWith(pathEndsWith), true)
.map(_._2.toString)
没有过滤器:
val dstream: DStream[String] = ssc
.fileStream[LongWritable, Text, TextInputFormat](filePath)
.map(_._2.toString)
作为序列文件:
val dstream: DStream[String] = ssc
.fileStream[Text, Text, SequenceFileAsTextInputFormat](filePath)
.map(_._2.toString)