Question

我正在从集合中读取文件路径

Collection<String> FileList = new ArrayList<>();

此集合可以包含超过600.000个文件路径，但使用我当前的方法最多需要几个小时，以创建包含所有信息的文本文件。

每个XML都包含一个-items列表，它可以有一个标签-value-，其属性为-value is_special =＆＃34; true＆＃34; - 。在这种情况下，应存储-item-的名称。结果如下：

C:\bar\foo\archive\T16-0B07186E3B194D2341256D2F003FF1FE.xml
C:\bar\foo\archive\C1257FBF0040265C-1\T26-75A218AFA1FC460B41256D9C00406708.xml
C:\bar\foo\archive\C1257FBF0040265C-1\T26-75A218AFA1FC460B41256D9C99406708.xml


Itemname:CreationDate

Itemname:PublishingDate

Itemname:ValidThruDate

Itemname:ArchiveDate

Itemname:ReleaseDate

Itemname:EraseDate

当前功能：

public void FullFilterAndExport() throws JAXBException, IOException {
totalFilesCount = 0;
totalFilesCountPositive = 0;
PrintWriter pWriter = new PrintWriter(new BufferedWriter(new FileWriter(DB_Path.toString() + "\\export_full.txt")));        
for(String file: FileList) {
    if (file.endsWith(".xml") && !file.contains("databaseinfo.xml")) {
        totalFilesCount = totalFilesCount +1;
        ItemList.clear();
        JAXBContext context = JAXBContext.newInstance(NotesDocumentMetaFile.class);
        Unmarshaller um = context.createUnmarshaller();
        NotesDocumentMetaFile docMetaFile = (NotesDocumentMetaFile) um.unmarshal(new FileReader(file));

        for(int i = 0; i < docMetaFile.getItems().size(); i++) {
            if(docMetaFile.getItems().get(i).getValueIsSpecial() == true) {
                ItemList.add("Itemname:" + docMetaFile.getItems().get(i).getName());
            }
        }
        if(!ItemList.isEmpty()) {
            totalFilesCountPositive = totalFilesCountPositive + 1;
            pWriter.println(file);
            pWriter.println();
            for(String item : ItemList) {
                pWriter.println(item);
            }
            pWriter.println();
        }

    }
}
pWriter.println();
pWriter.println("------------------");
pWriter.println("Anzahl der geprüften Dateien: " + totalFilesCount);
pWriter.println("Anzahl der geprüften positiven Dateien: " + totalFilesCountPositive);
if (pWriter != null){ 
    pWriter.flush(); 
    pWriter.close();
}

有没有机会改善表现？

Answer 1

profile（使用jvisualvm，包含在oracle jdk中），section cpu sampling snapshot。
罪魁祸首可能是jaxb。如果是这种情况，请尝试任何流式xml阅读器。代码将更加丑陋，但应该更快。重新测试/重新配置以检查cpu时间是什么
您可能希望将读取与xml文件解相关并写入输出文本文件，例如使用包含读取xml文件结果的SELECT @UserPoints = isnull(sum(points),0) FROM UserTransactions WHERE userid = @IUserId。这个队列将由几个并行读取xml的线程提供，并由写入线程使用，以便利用cpu的所有内核。

编辑：作为一个快速赢家，我认为这段代码：

BlockingDeque

可以移到for循环之外。它应该会给你一个很好的推动力。上下文是线程安全的，而unmarshaller不是，但可以重用几个文件。

Java：快速读取XML文件并在文本文件中存储信息的方法

1 个答案: