从Java中的标记之间获取XML文本

时间:2016-11-04 17:15:31

标签: java xml

我有以下xml条目。我希望在d:index标签关闭到条目结尾后提取所有内容。

<d:entry id="some_id" d:title="some_title">
        <d:index d:value="some_value"/>
        <h1>headlines</h1>

        <p>paragraphs</p>
        <div>
           <ul>
              <li>lists</li>

           </ul>
        </div>
        text like that
</d:entry>

我尝试使用

dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(file);
            doc.getDocumentElement().normalize();
eList = doc.getElementsByTagName("d:entry");
for (int i = 0; i < eList.getLength(); i++){
    Node nNode = eList.item(i);
    textList[i] = nNode.getTextContent();
}

但是,.getTextContent()只给我'那样的文字'而不是

<h1>headlines</h1>

<p>paragraphs</p>
   <div>
     <ul>
      <li>lists</li>

     </ul>
   </div>
text like that

1 个答案:

答案 0 :(得分:0)

根据您的确切想要做的事情,您可以执行以下操作:

import java.io.File;
import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;

public class Arbeiter {

public void arbeiten(File datei)
{
    Document doc = getDoc(datei);
    Element element = doc.getDocumentElement();
    print(element);
}

private Document getDoc(File datei)
{
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    Document doc = null;
    try {
        DocumentBuilder db = dbf.newDocumentBuilder();
        doc = db.parse(datei);
    } catch (ParserConfigurationException | SAXException | IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return doc;
}

private void print(Node node)
{
    for (int i=0; i<node.getChildNodes().getLength(); i++)
    {
        print(node.getFirstChild());
    }
    if(node.getTextContent()!=null)
    {
        System.out.println(node.getTextContent());
    }
}

}

输出结果为:

headlines
    paragraphs     
          lists
    text like that