java xml解析dblp

时间:2014-05-20 14:47:06

标签: java xml parsing xml-parsing

这是xml文件
请问如何解析标签作者示例我们不知道每个inproceeding有多少作者?

<?xml version="1.0" encoding="ISO-8859-1"?>    
<dblp>

<inproceedings mdate="2014-01-18" key="series/sci/AzzagL13">
<author>Hanane Azzag</author>
<author>Mustapha Lebbah</author>
<title>A New Way for Hierarchical and Topological Clustering.</title>
<pages>85-97</pages>
<year>2011</year>
<booktitle>EGC (best of volume)</booktitle>
<ee>http://dx.doi.org/10.1007/978-3-642-35855-5_5</ee>
<crossref>series/sci/2013-471</crossref>
<url>db/series/sci/sci471.html#AzzagL13</url>
</inproceedings>

<inproceedings mdate="2014-01-18" key="series/sci/RabatelBP13">
<author>Julien Rabatel</author>
<author>Sandra Bringay</author>
<author>Pascal Poncelet</author>
<title>Mining Sequential Patterns: A Context-Aware Approach.</title>
<pages>23-41</pages>
<year>2011</year>
<booktitle>EGC (best of volume)</booktitle>
<ee>http://dx.doi.org/10.1007/978-3-642-35855-5_2</ee>
<crossref>series/sci/2013-471</crossref>
<url>db/series/sci/sci471.html#RabatelBP13</url>
</inproceedings>
</dblp>

3 个答案:

答案 0 :(得分:1)

使用Xpath,快速而强大,这些行为您的示例返回5行

代码:

final Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new FileInputStream("input.xml"));

        final XPath xPath = XPathFactory.newInstance().newXPath();
        final NodeList nodeList = (NodeList) xPath.compile("//author").evaluate(document, XPathConstants.NODESET);
        for (int i = 0; i < nodeList.getLength(); i++) {
            System.out.println(nodeList.item(i).getFirstChild().getNodeValue());
        }

显示器:

Hanane Azzag
Mustapha Lebbah
Julien Rabatel
Sandra Bringay
Pascal Poncelet

答案 1 :(得分:1)

使用apache digester进行代码解析,这是在实际项目中解析时常用的。来自apache社区的好人

//根据您的需要更新代码。

   import java.io.ByteArrayInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map.Entry;

import org.apache.commons.digester.Digester;
import org.apache.commons.digester.Rule;
import org.apache.commons.digester.Rules;
import org.xml.sax.InputSource;


public class Parsing {
public static void main(String[] args) throws Exception{
    InputStream data = new FileInputStream("E:\\workspace\\trunk\\Parsing\\src\\data.xml");
    byte[] b = new byte[data.available()];
//  data.read(b);
    Digester digester = new Digester();
    //Genearting Array list while encountering dblp xpath
    digester.addObjectCreate("dblp", HashMap.class);
    digester.addObjectCreate("dblp/inproceedings", ArrayList.class);
    //Calling add method while encountering author xpath
    AuthorRule rule = new AuthorRule();
    digester.addRule("dblp/inproceedings/author", rule);
    digester.addRule("dblp/inproceedings/title", rule);
    digester.addRule("dblp/inproceedings", rule);

    HashMap parsedData = (HashMap) digester.parse(data);
    Iterator<Entry<String, ArrayList>> dataItr = parsedData.entrySet().iterator();
    while(dataItr.hasNext()){
        Entry<String, ArrayList> entry = dataItr.next();
        System.out.println("Title : " + entry.getKey() + ", Authors" + entry.getValue().toString());
    }


}
private static class AuthorRule extends Rule{
    String currentTitle = "";

    @Override
    public void body(String namespace, String name, String text)
    throws Exception {
        HashMap object = (HashMap) digester.peek(1);
        ArrayList authors = (ArrayList) digester.peek(0);
        if(name.equals("title")){
            currentTitle = text;
        }
        else if(name.equals("author")){
            authors.add(text);
        }
    }

    @Override
    public void end(String namespace, String name) throws Exception {
        HashMap object = (HashMap) digester.peek(1);
        ArrayList authors = (ArrayList) digester.peek(0);
        if(name.equals("inproceedings")){
            object.put(currentTitle, authors);
        }
    }
}
}

输出::
标题:分层和拓扑聚类的新方法。,Authros [Hanane Azzag,Mustapha Lebbah] 题目:挖掘序列模式:一种上下文意识的方法。,Authros [Julien Rabatel,Sandra Bringay,Pascal Poncelet]

答案 2 :(得分:0)

有很多方法,例如通过DOM:

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;

public class XmlAuthorReader {
    public static void main(String argv[]) {
        try {
            File fXmlFile = new File(<filePath>);
            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(fXmlFile);

            NodeList nList = doc.getElementsByTagName("author");

            System.out.println(nList.getLength()+ " author(s) found");
            for (int temp = 0; temp < nList.getLength(); temp++) {
                Node nNode = nList.item(temp);
                System.out.println("Author: " + nNode.getTextContent());
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

您可以在此处找到更多变体:http://www.mkyong.com/tutorials/java-xml-tutorials/