我想制作定制的nifi处理器,以下是我感兴趣的几个主题:
1.我想从处理器获取xml文件,而不是解析它提取文本值并将它们作为newle创建的流文件的属性,但我想更新(我的意思是将新值设置为此标记值之一)并将其回滚到文件夹,我该如何回滚此流文件?
如果我希望这个文件被多个处理器使用,我应该使用filec lock,或者在获取flowfile时保持源文件为false,哪一个是最佳实践? 现在我想要这样的OnTrigger代码:
final List<File> files = new ArrayList<>(batchSize);
queueLock.lock();
try {
fileQueue.drainTo(files, batchSize);
if (files.isEmpty()) {
return;
} else {
inProcess.addAll(files);
}
} finally {
queueLock.unlock();
}
//make xml parsing
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
try {
dBuilder = dbFactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
Document doc = null;
try {
File f= files.get(0);
doc = dBuilder.parse(f);
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
NodeList nList = doc.getElementsByTagName("localAttributes");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
start = eElement.getElementsByTagName("start").item(0).getTextContent();
startDate = eElement.getElementsByTagName("startDate").item(0).getTextContent();
endDate = eElement.getElementsByTagName("endDate").item(0).getTextContent();
patch = eElement.getElementsByTagName("patch").item(0).getTextContent();
runAs = eElement.getElementsByTagName("runAs").item(0).getTextContent();
}
}
final ListIterator<File> itr = files.listIterator();
FlowFile flowFile = null;
try {
final Path directoryPath = directory.toPath();
while (itr.hasNext()) {
final File file = itr.next();
final Path filePath = file.toPath();
final Path relativePath = directoryPath.relativize(filePath.getParent());
String relativePathString = relativePath.toString() + "/";
if (relativePathString.isEmpty()) {
relativePathString = "./";
}
final Path absPath = filePath.toAbsolutePath();
final String absPathString = absPath.getParent().toString() + "/";
flowFile = session.create();
final long importStart = System.nanoTime();
flowFile = session.importFrom(filePath, keepingSourceFile, flowFile);
final long importNanos = System.nanoTime() - importStart;
final long importMillis = TimeUnit.MILLISECONDS.convert(importNanos, TimeUnit.NANOSECONDS);
flowFile = session.putAttribute(flowFile, CoreAttributes.FILENAME.key(), file.getName());
flowFile = session.putAttribute(flowFile, CoreAttributes.PATH.key(), relativePathString);
flowFile = session.putAttribute(flowFile, CoreAttributes.ABSOLUTE_PATH.key(), absPathString)
Map<String, String> attributes = getAttributesFromFile(filePath);
if (attributes.size() > 0) {
flowFile = session.putAllAttributes(flowFile, attributes);
}
FlowFile flowFile1= session.create();
flowFile = session.putAttribute(flowFile, CoreAttributes.FILENAME.key(), file.getName());
flowFile = session.putAttribute(flowFile, CoreAttributes.PATH.key(), relativePathString);
flowFile = session.putAttribute(flowFile, CoreAttributes.ABSOLUTE_PATH.key(), absPathString);
flowFile = session.putAttribute(flowFile, "start", start);
flowFile = session.putAttribute(flowFile, "startDate", startDate);
flowFile = session.putAttribute(flowFile, "endDate", endDate);
flowFile = session.putAttribute(flowFile, "runAs", runAs);
flowFile = session.putAttribute(flowFile, "patch", patch);
session.getProvenanceReporter().receive(flowFile, file.toURI().toString(), importMillis);
session.transfer(flowFile1, REL_SUCCESS);
FlowFile flowFile3=session.create();
flowFile3=session.importFrom(filePath, keepingSourceFile, flowFile);
NodeList run = doc.getElementsByTagName("runAs");
run.item(0).setNodeValue("false");
session.transfer(flowFile3,REL_ROLLBACK);
session.remove(flowFile);
答案 0 :(得分:3)
我在最近几天发布了非常类似的问题,并回复了"Nifi:Writing new Processors"和"Nifi: how to write Custom processor"。
我完全支持学习如何在Apache NiFi中进行自定义处理器开发,但这个用例对我来说没有意义。从文件系统(HDFS或其他)检索文件是一个原子工作单元,不应与XML解析结合使用。将GetFile
处理器(或ListFile
/ FetchFile
对)与EvaluateXPath
处理器组合以执行此逻辑。源文件将保留在原始文件系统位置,您将获得对流的更多控制和可见性,更不用说更强大的性能和可维护性。如果您需要许多流使用它,您可以将此段导出为模板,或者从其他处理器提供输入以确定要获取哪些文件并输出到RouteOnAttribute
处理器以将结果定向到各种消费者按filename
或其他此类属性。
如果您对自定义处理器开发感兴趣,Developer Guide和Contributor Guide都提供了出色的参考信息,Bryan Bende's blog提供了很好的演练。