我正在使用NiFi ExecuteScript调用Groovy脚本,该脚本从PDF提取文本。当提取失败时,应该引发异常,并将流文件重定向到REL_FAILURE。有些PDF可以顺利通过,有些则给出错误:
ExecuteScript[id=9a39e0cb-ebcc-31e4-a169-575e367046e9] Failed to process session due to javax.script.ScriptException: javax.script.ScriptException: java.lang.IllegalStateException: StandardFlowFileRecord[uuid=2d6540f7-b7a2-48c7-8978-6b90bbfb0ff5,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1538596326047-12, container=default, section=12], offset=2134, length=930225],offset=0,name=1 i-9 INS rev 87 05-07-87.pdf,size=930225] already in use for an active callback or an OutputStream created by ProcessSession.write(FlowFile) has not been closed: org.apache.nifi.processor.exception.ProcessException: javax.script.ScriptException: javax.script.ScriptException: java.lang.IllegalStateException: StandardFlowFileRecord[uuid=2d6540f7-b7a2-48c7-8978-6b90bbfb0ff5,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1538596326047-12, container=default, section=12], offset=2134, length=930225],offset=0,name=1 i-9 INS rev 87 05-07-87.pdf,size=930225] already in use for an active callback or an OutputStream created by ProcessSession.write(FlowFile) has not been closed
我的(简化)代码如下:
def flowFile = session.get()
if(!flowFile) return
flowFile = session.write(flowFile, { inputStream, outputStream ->
try {
// Load PDF from inputStream and parses text into a JSON string
// If nothing can be extracted, throw an exception so the flowfile
// can be routed to REL_FAILURE and processed further down the NiFi pipeline
if(outputLength < 15) {
throw new Exception('No output, send to REL_FAILURE')
}
// Write the string to the flowFile to be transferred
outputStream.write(json.getBytes(StandardCharsets.UTF_8))
} catch (Exception e){
System.out.println(e.getMessage())
session.transfer(flowFile, REL_FAILURE)
}
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
它紧跟cookbook posted in the Hortonworks community forum之后,作者甚至提到关闭是自动处理的。
我认为该错误是由于PDF无法处理而引起的。这将引发异常,应将其捕获在try{}catch{}
中,然后将其传输到REL_FAILURE。相反,似乎从未调用过catch{}
,因此outputStream对象也从未关闭过。当我在NiFi外部运行相同的Groovy代码时,它可以按预期工作并被捕获。
如果您想尝试在自己的服务器上运行它
答案 0 :(得分:1)
try / catch应该在session.write()调用之外,而不是在回调中。在回调内部,抛出IOException而不是Exception,该异常应通过session.write()传播并应在外部输入catch子句。然后,您可以将流文件传输到失败(在写入流文件时,不应允许它传输)。