这是xml文件
请问如何解析标签作者示例我们不知道每个inproceeding有多少作者?
<?xml version="1.0" encoding="ISO-8859-1"?>
<dblp>
<inproceedings mdate="2014-01-18" key="series/sci/AzzagL13">
<author>Hanane Azzag</author>
<author>Mustapha Lebbah</author>
<title>A New Way for Hierarchical and Topological Clustering.</title>
<pages>85-97</pages>
<year>2011</year>
<booktitle>EGC (best of volume)</booktitle>
<ee>http://dx.doi.org/10.1007/978-3-642-35855-5_5</ee>
<crossref>series/sci/2013-471</crossref>
<url>db/series/sci/sci471.html#AzzagL13</url>
</inproceedings>
<inproceedings mdate="2014-01-18" key="series/sci/RabatelBP13">
<author>Julien Rabatel</author>
<author>Sandra Bringay</author>
<author>Pascal Poncelet</author>
<title>Mining Sequential Patterns: A Context-Aware Approach.</title>
<pages>23-41</pages>
<year>2011</year>
<booktitle>EGC (best of volume)</booktitle>
<ee>http://dx.doi.org/10.1007/978-3-642-35855-5_2</ee>
<crossref>series/sci/2013-471</crossref>
<url>db/series/sci/sci471.html#RabatelBP13</url>
</inproceedings>
</dblp>
答案 0 :(得分:1)
使用Xpath,快速而强大,这些行为您的示例返回5行
代码:
final Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new FileInputStream("input.xml"));
final XPath xPath = XPathFactory.newInstance().newXPath();
final NodeList nodeList = (NodeList) xPath.compile("//author").evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getFirstChild().getNodeValue());
}
显示器:
Hanane Azzag
Mustapha Lebbah
Julien Rabatel
Sandra Bringay
Pascal Poncelet
答案 1 :(得分:1)
使用apache digester进行代码解析,这是在实际项目中解析时常用的。来自apache社区的好人
//根据您的需要更新代码。
import java.io.ByteArrayInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map.Entry;
import org.apache.commons.digester.Digester;
import org.apache.commons.digester.Rule;
import org.apache.commons.digester.Rules;
import org.xml.sax.InputSource;
public class Parsing {
public static void main(String[] args) throws Exception{
InputStream data = new FileInputStream("E:\\workspace\\trunk\\Parsing\\src\\data.xml");
byte[] b = new byte[data.available()];
// data.read(b);
Digester digester = new Digester();
//Genearting Array list while encountering dblp xpath
digester.addObjectCreate("dblp", HashMap.class);
digester.addObjectCreate("dblp/inproceedings", ArrayList.class);
//Calling add method while encountering author xpath
AuthorRule rule = new AuthorRule();
digester.addRule("dblp/inproceedings/author", rule);
digester.addRule("dblp/inproceedings/title", rule);
digester.addRule("dblp/inproceedings", rule);
HashMap parsedData = (HashMap) digester.parse(data);
Iterator<Entry<String, ArrayList>> dataItr = parsedData.entrySet().iterator();
while(dataItr.hasNext()){
Entry<String, ArrayList> entry = dataItr.next();
System.out.println("Title : " + entry.getKey() + ", Authors" + entry.getValue().toString());
}
}
private static class AuthorRule extends Rule{
String currentTitle = "";
@Override
public void body(String namespace, String name, String text)
throws Exception {
HashMap object = (HashMap) digester.peek(1);
ArrayList authors = (ArrayList) digester.peek(0);
if(name.equals("title")){
currentTitle = text;
}
else if(name.equals("author")){
authors.add(text);
}
}
@Override
public void end(String namespace, String name) throws Exception {
HashMap object = (HashMap) digester.peek(1);
ArrayList authors = (ArrayList) digester.peek(0);
if(name.equals("inproceedings")){
object.put(currentTitle, authors);
}
}
}
}
输出::
标题:分层和拓扑聚类的新方法。,Authros [Hanane Azzag,Mustapha Lebbah]
题目:挖掘序列模式:一种上下文意识的方法。,Authros [Julien Rabatel,Sandra Bringay,Pascal Poncelet]
答案 2 :(得分:0)
有很多方法,例如通过DOM:
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
public class XmlAuthorReader {
public static void main(String argv[]) {
try {
File fXmlFile = new File(<filePath>);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
NodeList nList = doc.getElementsByTagName("author");
System.out.println(nList.getLength()+ " author(s) found");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
System.out.println("Author: " + nNode.getTextContent());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
您可以在此处找到更多变体:http://www.mkyong.com/tutorials/java-xml-tutorials/