我需要阅读'搜索'的输出标记来自以下url usign Java。
首先,我需要从以下URL读取XML到某些字符串: http://en.wikipedia.org/w/api.php?format=xml&action=query&list=search&srlimit=1&srsearch=big+brother
我应该最终得到这个:
<api>
<query-continue>
<search sroffset="1"/>
</query-continue>
<query>
<searchinfo totalhits="55180"/>
<search>
<p ns="0" title="Big Brothers Big Sisters of America" snippet="<span class='searchmatch'>Big</span> <span class='searchmatch'>Brothers</span> <span class='searchmatch'>Big</span> Sisters of America is a 501(c)(3) non-profit organization whose goal is to help all children reach their potential through <b>...</b> " size="13008" wordcount="1906" timestamp="2014-04-15T06:46:01Z"/>
</search>
</query>
</api>
然后,一旦我拥有XML,我需要获取搜索标记的内容: 输出&#39;搜索&#39;标签看起来像这样,我需要从中间的代码中得到两个部分:
<search>
<p ns="0" title="Big Brothers Big Sisters of America" snippet="<span class='searchmatch'>Big</span> <span class='searchmatch'>Brothers</span> <span class='searchmatch'>Big</span> Sisters of America is a 501(c)(3) non-profit organization whose goal is to help all children reach their potential through <b>...</b> " size="13008" wordcount="1906" timestamp="2014-04-15T06:46:01Z"/>
</search>
最后,我需要的是两个字符串,它们等于:
String title = Big Brothers Big Sisters of America
String snippet = "<span class='searchmatch'>Big..."
有人可以帮我修改这段代码吗,我不确定我做错了什么。我不认为它甚至从url中检索XML,更不用说XML中的标记了。
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("http://en.wikipedia.org/w/api.php?format=xml&action=query&list=search&srlimit=1&srsearch=big+brother");
doc.getDocumentElement().normalize();
XPathFactory xFactory = XPathFactory.newInstance();
XPath xpath = xFactory.newXPath();
XPathExpression expr = xpath.compile("//query/search/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i=0; i<nodes.getLength();i++){
System.out.println(nodes.item(i).getNodeValue());
}
抱歉,我是新手,无法在任何地方找到答案。
答案 0 :(得分:2)
这里的主要问题是你要求的文本节点是<search>
的子节点,但事实上你想要的<p ..>
不是文本节点:它是一个元素。 (事实上,<search>
元素没有文本节点子节点,因为您可以使用“查看源”查看从该URL查看响应的时间。)
所以你要做的就是将XPath表达式改为
//query/search/p
将为您提供p
元素节点。然后在Java代码中询问此节点的两个属性title
和snippet
的值:
Element e = (Element)(nodes.item(i));
String title = e.getAttribute("title");
String snippet = e.getAttribute("snippet");
或者,您可以执行两个XPath查询,每个属性一个:
//query/search/p/@title
和
//query/search/p/@snippet
假设只有一个<p>
元素。如果您在多个<p>
元素上执行此操作,则可能希望将每对属性保留在一起,而不是具有两个单独的结果列表。