Nifi:如何为Custom nifi procesessors

时间:2017-10-03 18:25:33

标签: apache-nifi

我想制作定制的nifi处理器,以下是我感兴趣的几个主题:

1.我想从处理器获取xml文件,而不是解析它提取文本值并将它们作为newle创建的流文件的属性,但我想更新(我的意思是将新值设置为此标记值之一)并将其回滚到文件夹,我该如何回滚此流文件?

  1. 如果我希望这个文件被多个处理器使用,我应该使用filec lock,或者在获取flowfile时保持源文件为false,哪一个是最佳实践? 现在我想要这样的OnTrigger代码:

        final List<File> files = new ArrayList<>(batchSize);
        queueLock.lock();
        try {
            fileQueue.drainTo(files, batchSize);
            if (files.isEmpty()) {
                return;
            } else {
                inProcess.addAll(files);
            }
        } finally {
            queueLock.unlock();
        }
    
        //make  xml parsing
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        try {
            dBuilder = dbFactory.newDocumentBuilder();
        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        }
        Document doc = null;
        try {
            File f=  files.get(0);
            doc = dBuilder.parse(f);
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        NodeList nList = doc.getElementsByTagName("localAttributes");
        for (int temp = 0; temp < nList.getLength(); temp++) {
    
            Node nNode = nList.item(temp);
    
    
            if (nNode.getNodeType() == Node.ELEMENT_NODE) {
    
                Element eElement = (Element) nNode;
    
    
                start = eElement.getElementsByTagName("start").item(0).getTextContent();
                startDate = eElement.getElementsByTagName("startDate").item(0).getTextContent();
                endDate = eElement.getElementsByTagName("endDate").item(0).getTextContent();
                patch = eElement.getElementsByTagName("patch").item(0).getTextContent();
                runAs = eElement.getElementsByTagName("runAs").item(0).getTextContent();
    
            }
        }
    
        final ListIterator<File> itr = files.listIterator();
    
        FlowFile flowFile = null;
        try {
            final Path directoryPath = directory.toPath();
            while (itr.hasNext()) {
                final File file = itr.next();
                final Path filePath = file.toPath();
                final Path relativePath = directoryPath.relativize(filePath.getParent());
                String relativePathString = relativePath.toString() + "/";
                if (relativePathString.isEmpty()) {
                    relativePathString = "./";
                }
                final Path absPath = filePath.toAbsolutePath();
                final String absPathString = absPath.getParent().toString() + "/";
    
                flowFile = session.create();
                final long importStart = System.nanoTime();
                flowFile = session.importFrom(filePath, keepingSourceFile, flowFile);
                final long importNanos = System.nanoTime() - importStart;
                final long importMillis = TimeUnit.MILLISECONDS.convert(importNanos, TimeUnit.NANOSECONDS);
    
                flowFile = session.putAttribute(flowFile, CoreAttributes.FILENAME.key(), file.getName());
                flowFile = session.putAttribute(flowFile, CoreAttributes.PATH.key(), relativePathString);
                flowFile = session.putAttribute(flowFile, CoreAttributes.ABSOLUTE_PATH.key(), absPathString)
    
                Map<String, String> attributes = getAttributesFromFile(filePath);
                if (attributes.size() > 0) {
                    flowFile = session.putAllAttributes(flowFile, attributes);
                }
    
                FlowFile flowFile1= session.create();
                flowFile = session.putAttribute(flowFile, CoreAttributes.FILENAME.key(), file.getName());
                flowFile = session.putAttribute(flowFile, CoreAttributes.PATH.key(), relativePathString);
                flowFile = session.putAttribute(flowFile, CoreAttributes.ABSOLUTE_PATH.key(), absPathString);
                flowFile = session.putAttribute(flowFile, "start", start);
                flowFile = session.putAttribute(flowFile, "startDate", startDate);
                flowFile = session.putAttribute(flowFile, "endDate", endDate);
                flowFile = session.putAttribute(flowFile, "runAs", runAs);
                flowFile = session.putAttribute(flowFile, "patch", patch);
    
                session.getProvenanceReporter().receive(flowFile, file.toURI().toString(), importMillis);
                session.transfer(flowFile1, REL_SUCCESS);
    
                FlowFile flowFile3=session.create();
                flowFile3=session.importFrom(filePath, keepingSourceFile, flowFile);
    
                NodeList run = doc.getElementsByTagName("runAs");
                run.item(0).setNodeValue("false");
                 session.transfer(flowFile3,REL_ROLLBACK);
                session.remove(flowFile);
    

1 个答案:

答案 0 :(得分:3)

我在最近几天发布了非常类似的问题,并回复了"Nifi:Writing new Processors""Nifi: how to write Custom processor"

我完全支持学习如何在Apache NiFi中进行自定义处理器开发,但这个用例对我来说没有意义。从文件系统(HDFS或其他)检索文件是一个原子工作单元,不应与XML解析结合使用。将GetFile处理器(或ListFile / FetchFile对)与EvaluateXPath处理器组合以执行此逻辑。源文件将保留在原始文件系统位置,您将获得对流的更多控制和可见性,更不用说更强大的性能和可维护性。如果您需要许多流使用它,您可以将此段导出为模板,或者从其他处理器提供输入以确定要获取哪些文件并输出到RouteOnAttribute处理器以将结果定向到各种消费者按filename或其他此类属性。

如果您对自定义处理器开发感兴趣,Developer GuideContributor Guide都提供了出色的参考信息,Bryan Bende's blog提供了很好的演练。