如何遍历目录和子目录中的所有XML文件以使用java读取特定元素?

时间:2016-05-04 10:30:47

标签: java xml parsing dom

  • 我有一个目录,它有许多子目录。
  • 它包含多个xml文件,它可能具有相同的文件名,但位于不同的目录中。
  • 现在我想读取所有xml文件以获取xml元素并存储在数组列表中。
  • 但是在解析xml文件时它会抛出一个错误,如java.io.FileNotFoundException:\ BDOPS-4 \ ORDERS \ CreateCLELE \ APRIL-2016 \ 28-04-2016 \ 8449066_1 \ ItemFile \ 1461809102571_4 \ ftp \ content-providers \ ewh -e \ data \ incoming \ OBI00000000001818A \ OBI00000000001818 \ 00012092 \ v103i5 \ si540.dtd(系统找不到指定的文件)
  • 目录中没有搜索到的文件(si540.dtd)。 任何人都可以帮我解决这个问题。现在我已经为我的代码提供了堆栈跟踪。

提前致谢

package Read_XML;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.filefilter.TrueFileFilter;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import DB_INFO.DOI;
import DB_INFO.Insert_Missing_DOI;
public class Read_DOI {
public static void read_XML_for_DOI(String root_path){

    System.out.println("Received Incoming path : "+root_path);

    File f = null;
    try {
        String root = root_path;
        f = new File(root);
        //shall accept all files in directories and subdirectories
        List<File> files = (List<File>) FileUtils.listFiles(f,    
 TrueFileFilter.INSTANCE, TrueFileFilter.INSTANCE);
        ArrayList<String> issn_valueLst = new ArrayList<>();
        for (File fXmlFile : files) {
            // prints filename and directory name
            if(accept(fXmlFile.getName(), ".xml")){
            DocumentBuilderFactory dbFactory  =   
 DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = 
dbFactory.newDocumentBuilder();           
            System.out.println("XML Name::"+fXmlFile.getName());
            Document doc = dBuilder.parse(fXmlFile);
            doc.getDocumentElement().normalize();
            System.out.println("Traversing File : "+fXmlFile.getName());
            System.out.println("Traversing path : 
 "+fXmlFile.getAbsolutePath());
            NodeList nList2=doc.getElementsByTagName("ce:doi");
            //NodeList nList2=doc.getElementsByTagName("DOI");

            if(nList2.getLength()>=1)
            {
                 for (int temp2 = 0; temp2 < nList2.getLength(); 
 temp2++) {
                     Node nNode4 = nList2.item(temp2);

                     if (nNode4.getNodeType() == Node.ELEMENT_NODE) 
                     {
                        Element eElement1 = (Element) nNode4;
                        issn_valueLst.add(eElement1.getTextContent());
                        //issn_valueLst.add(System.lineSeparator());
                   } 
                           }
              }
            }
        }

        System.out.println("DOI IN DB : "+DOI.DOI_values.toString());

        System.out.println("The DOI Values in INPUT XML :  
"+issn_valueLst.toString());
        System.out.println("Total number of DOI in INout XML : 
 "+issn_valueLst.size());

        //secondList.removeAll(firstList);

        issn_valueLst.removeAll(DOI.DOI_values);  


        if(issn_valueLst.size()>0)
        {
            System.out.println("\nThe Missing new DOI in the Input XML : 
 "+issn_valueLst.toString());
            Insert_Missing_DOI.insert_DOI(issn_valueLst.toString());
        }
        else
        {
            System.out.println("ALL DOI are available in input xml");
        }
        System.out.println();



       // }
    } 
    catch(FileNotFoundException fe)
    {

        System.out.println("File not found");
        fe.printStackTrace();
    }

    catch (Exception e) {
        // if any error occurs
        e.printStackTrace();
    }


}



  public static boolean accept( String name, String str) {
    return name.toLowerCase().endsWith(str.toLowerCase());
  }
 }

堆栈追踪:

java.io.FileNotFoundException: \\BDOPS-4\ORDERS\CreateCLELE\APRIL-2016\28-04-2016\8449066_1\ItemFile\1461809102571_4\ftp\content-providers\ewh-e\data\incoming\OBI00000000001818A\OBI00000000001818\00012092\v103i5\si540.dtd (The system cannot find the file specified)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
    at Read_XML.Read_DOI.read_XML_for_DOI(Read_DOI.java:49)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:29)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.find(find_incoming_dir.java:15)
    at Extraction.ZipExtraction.extract(ZipExtraction.java:41)
    at execution_Point.Exact_orderpath.find_exact_path(Exact_orderpath.java:24)
    at execution_Point.Get_orderpath.getorderpath_from_orderinfo(Get_orderpath.java:53)
    at execution_Point.Get_order_from_marker.starter_pub(Get_order_from_marker.java:270)
    at execution_Point.Cl_Execute.main(Cl_Execute.java:47)

2 个答案:

答案 0 :(得分:0)

很可能,该异常表示您无权读取该文件。来自FileNotFoundException documentation

  

此异常......也将被抛出......如果文件确实存在但由于某种原因无法访问......

异常名称对于那种类型的错误没有多大意义,是吗? java.io.File是一个非常古老的类,来自Java 1.0。如果您想获得更多有用的反馈,请使用文件的现代替代品Path类:

Path f = Paths.get(root);
try (DirectoryStream<Path> dir = Files.newDirectoryStream(f, "*.xml")) {
    for (Path fXmlFile : dir) {
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();           
        System.out.println("XML Name::" + fXmlFile.getFileName());
        Document doc;
        try (InputStream stream = Files.newInputStream(fXmlFile)) {
            doc = dBuilder.parse(stream);
        }
        // etc.
    }
}

答案 1 :(得分:0)

这是由于实体解决问题。

以下代码通过忽略实体解析过程来解决问题。

echo Configure::read();