我想创建一个读取http流的SourceFunction。 我使用了ScalaJ,它可以实现我想要的功能(它通过\ n-s分割传入的文本)。 显然代码在Flink之外工作,但是每次我作为Flink作业启动它时都会得到一个NullPointerExcetion(有时会在发送1-2个元素后的1-2秒之后立即执行)。看起来像Http对象有一些问题。
import org.apache.flink.streaming.api.functions.source.SourceFunction
import scala.io.Source.fromInputStream
import scalaj.http._
class HttpSource(url: String) extends SourceFunction[String] {
@volatile var isRunning = true
override def cancel(): Unit = isRunning = false
override def run(ctx: SourceFunction.SourceContext[String]): Unit =
httpStream(ctx.collect)
private def httpStream(f: String => Unit) = {
val request = Http(url)
request
.execute { inputStream =>
fromInputStream(inputStream)
.getLines()
.takeWhile(_ => isRunning)
.foreach(f)
}
}
}
这是我通常得到的例外情况: (有时它有点不同,例如我试图使请求值瞬态,然后当它试图引用请求时它已经为空)
Caused by: java.lang.NullPointerException
at java.io.Reader.<init>(Reader.java:78)
at java.io.InputStreamReader.<init>(InputStreamReader.java:129)
at scala.io.BufferedSource.reader(BufferedSource.scala:24)
at scala.io.BufferedSource.bufferedReader(BufferedSource.scala:25)
at scala.io.BufferedSource.scala$io$BufferedSource$$charReader$lzycompute(BufferedSource.scala:35)
at scala.io.BufferedSource.scala$io$BufferedSource$$charReader(BufferedSource.scala:33)
at scala.io.BufferedSource.scala$io$BufferedSource$$decachedReader(BufferedSource.scala:62)
at scala.io.BufferedSource$BufferedLineIterator.<init>(BufferedSource.scala:67)
at scala.io.BufferedSource.getLines(BufferedSource.scala:86)
at flinkextension.HttpSource$$anonfun$httpStream$1.apply(HttpSource.scala:21)
at flinkextension.HttpSource$$anonfun$httpStream$1.apply(HttpSource.scala:19)
at scalaj.http.HttpRequest$$anonfun$execute$1.apply(Http.scala:323)
at scalaj.http.HttpRequest$$anonfun$execute$1.apply(Http.scala:323)
at scalaj.http.HttpRequest$$anonfun$toResponse$3.apply(Http.scala:388)
at scalaj.http.HttpRequest$$anonfun$toResponse$3.apply(Http.scala:380)
at scala.Option.getOrElse(Option.scala:121)
at scalaj.http.HttpRequest.toResponse(Http.scala:380)
at scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:360)
at scalaj.http.HttpRequest.exec(Http.scala:335)
at scalaj.http.HttpRequest.execute(Http.scala:323)
at flinkextension.HttpSource.httpStream(HttpSource.scala:19)
at flinkextension.HttpSource.run(HttpSource.scala:14)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:748)
当我不使用http请求时,其他所有内容似乎都正常工作,但是其他类似于使用相同InputStream类型的文件读取,只是带有字符串的简单while循环,甚至当我使用单个http请求时没有流媒体。
我觉得我缺少一些理论背景,也许flink在后台做了一些破坏Http对象或InputStream的东西,但我没有在文档中找到任何东西。
更新#1:
如果我对lambda进行空检查,作业通常会立即退出,有时会处理一些元素,有时会在挂起一分钟后超时。这是httpStream函数的这个版本:
private def httpStream(f: String => Unit) = {
val request = Http(url)
request
.execute { inputStream =>
if (inputStream == null) println("null inputstream")
else {
println("not null inputstream")
fromInputStream(inputStream)
.getLines()
.takeWhile(_ => isRunning)
.foreach(f)
}
}
}
更新#2:
代码实际上在分布式模式下工作,并使用StreamExecutionEnvironment.createLocalEnvironment()
如果我使用start-local.sh并将jar提交给它,我只会遇到这个问题。