我想存储此网页的评论部分: -
这是我的java代码: -
import java.io.*;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
public class UrlReadPageDemo1 {
public static void main(String[] args) throws XPathExpressionException, IOException {
System.out.println("helllo\n\n\n");
Document doc = Jsoup.connect("http://timesofindia.indiatimes.com/india/Officer-who-tracked-major-scams-back-in-Enforcement-Directorate/articleshow/27933692.cms").get();
String exp = "//div[@class='master_container']/[@id='netspidersosh']/div[@class='navlft']/div[@class='padlftrgt']/div[@class='clearFix']/div[@class='flL left_bdr']/[@id='populatecomment']/[@id='cmtMainBox']/div/[@id='cmtBox']/div/[@id='box']/[@id='cmt']/div/span";
System.out.println(exp);
XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
NodeList fav = (NodeList) xPath.evaluate(exp,doc.getAllElements(), XPathConstants.NODESET);
Element Comment = (Element) fav.item(17);
String str = Comment.getTextContent();
System.out.println(str);
}
}
发生错误: -
Exception in thread "main" javax.xml.transform.TransformerException: A location step was expected following the '/' or '//' token.
at com.sun.org.apache.xpath.internal.compiler.XPathParser.error(XPathParser.java:612)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.RelativeLocationPath(XPathParser.java:1641)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.LocationPath(XPathParser.java:1599)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.PathExpr(XPathParser.java:1319)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.UnionExpr(XPathParser.java:1238)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.UnaryExpr(XPathParser.java:1144)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.MultiplicativeExpr(XPathParser.java:1065)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.AdditiveExpr(XPathParser.java:1007)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.RelationalExpr(XPathParser.java:932)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.EqualityExpr(XPathParser.java:872)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.AndExpr(XPathParser.java:836)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.OrExpr(XPathParser.java:809)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.Expr(XPathParser.java:792)
at com.sun.org.apache.xpath.internal.compiler.XPathParser.initXPath(XPathParser.java:131)
at com.sun.org.apache.xpath.internal.XPath.<init>(XPath.java:180)
at com.sun.org.apache.xpath.internal.XPath.<init>(XPath.java:268)
at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:188)
at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:270)
at UrlReadPageDemo1.main(UrlReadPageDemo1.java:29)
---------------链接到------------------
javax.xml.xpath.XPathExpressionException
at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:284)
at UrlReadPageDemo1.main(UrlReadPageDemo1.java:29)
所以请帮我解决这个问题...
答案 0 :(得分:2)
错误说:
A location step was expected following the '/' or '//' token.
问题是斜线后,括号不够。预计会element
或@attribute
,可能带有some_axis::
前缀。括号中的谓词进一步过滤匹配节点集。要匹配任何元素,请使用*
,例如
//div[@class='master_container']/*[@id='netspidersosh']
顺便问一下,为什么这么长的XPath?在HTML中,id值应该是唯一的,因此这个表达式可能就足够了:
//*[@id='cmt']/div/span
更新
可以在以下位置找到XPath的入门级教程:http://zvon.org/xxl/XPathTutorial/Output/example1.html