如何在超过10000条记录的情况下更快地在XML节点中添加属性?(java)

时间:2018-01-03 12:05:57

标签: java xml dom sax stax

我必须向XML节点添加一个属性,记录超过10k,这是更快地转换XML文档的最佳方法。

我尝试过StAX解析器,添加属性几乎需要4分钟,使用SAX解析器需要5分钟。

是否有其他可用的库可以做得更好或其他方式做到这一点请提出你的建议。

示例代码:(使用STAX Parser)

try {
        XMLStreamReader r = factory.createXMLStreamReader(new FileInputStream(inputfile));
        /* Start Writing document */
        XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
        XMLEventWriter xmlEventWriter = xmlOutputFactory.createXMLEventWriter(new FileOutputStream(outputfile),
                "UTF-8");
        /* End Writing document */
        int event = r.getEventType();
        long startTime = System.currentTimeMillis();
        System.out.println("Started reading node from xml document....." + TimeUnit.MILLISECONDS.toSeconds(startTime));
        int node1Cnt = 0, node2Cnt = 0, node3Cnt = 0, node4Cnt = 0;
        while (true) {
            XMLEventFactory eventFactory = XMLEventFactory.newInstance();
            switch (event) {
                case XMLStreamConstants.START_DOCUMENT:
                    // System.out.println("Start Document.");
                    StartDocument startDocument = eventFactory.createStartDocument();
                    xmlEventWriter.add(startDocument);
                    break;
                case XMLStreamConstants.START_ELEMENT:
                    // Create Start node
                    if (r.getLocalName().equalsIgnoreCase(node1)) {
                        node1Cnt++;
                        node2Cnt = 0;
                        Attribute attribute = eventFactory.createAttribute("id", "5522" + node1Cnt);
                        List attributeList = Arrays.asList(attribute);
                        List nsList = Arrays.asList();
                        StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName(),attributeList.iterator(), nsList.iterator());
                        xmlEventWriter.add(sElement);
                    } else if (r.getLocalName().equalsIgnoreCase(node2Cnt)) {
                        node2Cnt++;
                        Attribute attribute = eventFactory.createAttribute("id", "5522" + node1Cnt + node2Cnt);
                        List attributeList = Arrays.asList(attribute);
                        List nsList = Arrays.asList();
                        StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName(),
                                attributeList.iterator(), nsList.iterator());
                        xmlEventWriter.add(sElement);
                    } else {
                        StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName());
                        xmlEventWriter.add(sElement);
                    }
                    StartElement sElement = eventFactory.createStartElement("", "", r.getLocalName());
                    xmlEventWriter.add(sElement);
                    break;
                case XMLStreamConstants.CHARACTERS:
                    if (r.isWhiteSpace())
                        break; // System.out.println("Text: " + r.getText());
                    Characters characters = eventFactory.createCharacters(r.getText());
                    xmlEventWriter.add(characters);
                    break;
                case XMLStreamConstants.END_ELEMENT:
                    // System.out.println("End Element:" + r.getName());
                    EndElement endElement = eventFactory.createEndElement("", "", r.getLocalName());
                    xmlEventWriter.add(endElement);
                    break;
                case XMLStreamConstants.END_DOCUMENT:
                    xmlEventWriter.add(eventFactory.createEndDocument());
                    break;
            }
            if (!r.hasNext())
                break;

            event = r.next();
        }
        r.close();
        System.out.println("Ended reading node from xml document....."
                + (TimeUnit.MILLISECONDS.toSeconds(System.currentTimeMillis())
                        - TimeUnit.MILLISECONDS.toSeconds(startTime)));
    }catch(XMLStreamException ex){
        ex.printStackTrace();
    }catch(IOException ex){
        // TODO Auto-generated catch block
        ex.printStackTrace();
    }finally{
        System.out.println("finish!!");
    }

1 个答案:

答案 0 :(得分:0)

我怀疑XMLEventFactory.newInstance()非常昂贵,因为它涉及搜索类路径。绝对不需要在事件循环中创建新工厂:在开始时创建一个工厂并重复使用它。

除此之外,我怀疑使用XMLStreamWriter可能比使用XMLEventWriter更容易,更快。

(但这些性能推测是猜测,因为在调整性能时,您需要进行测量以评估代码更改的影响。)

我个人会在XSLT中写这个。你没有给出足够详细的转换,但在XSLT 3.0中它是这样的:

extension Date {

var tomorrow: Date? {
    return Calendar.current.date(byAdding: .day,
                                 value: 1,
                                 to: self)
}

    func daysCount(until date: Date) -> (weekendDays: Int, workingDays: Int) {
    var weekendDays = 0
    var workingDays = 0
    var startDate = self
    repeat {
        if Calendar.current.isDateInWeekend(startDate) {
            weekendDays +=  1
        } else {
            workingDays += 1
        }
        if let tomorrow = startDate.tomorrow {
            startDate = tomorrow
        } else {
            break
        }
    } while startDate < date
    return (weekendDays, workingDays)
}