在Java中评估XPath表达式时出现异常

时间:2018-11-04 16:33:33

标签: java xpath xhtml jtidy

我正在尝试学习Xpath表达式在Java中的用法。我正在使用Jtidy将HTML页面转换为XHTML,以便可以使用XPath表达式轻松解析它。我有以下代码:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);


DocumentBuilder builder = factory.newDocumentBuilder();
    Document doc = ConvertXHTML("https://twitter.com/?lang=fr");

//Create XPath

XPathFactory xpathfactory = XPathFactory.newInstance();
XPath Inst= xpathfactory.newXPath();
NodeList nodes = (NodeList)Inst.evaluate("//p/@align",doc,XPathConstants.NODESET);
    for (int i = 0; i < nodes.getLength(); ++i) 
   {
            Element e = (Element) nodes.item(i);
            System.out.println(e);
    }

public Document ConvertXHTML(String link){
  try{

      URL u = new URL(link);

     BufferedInputStream instream=new BufferedInputStream(u.openStream());
     FileOutputStream outstream=new FileOutputStream("out.xhtml");

     Tidy c=new Tidy();
     c.setShowWarnings(false);
     c.setInputEncoding("UTF-8");
     c.setOutputEncoding("UTF-8");
     c.setXHTML(true);

     return c.parseDOM(instream,outstream);
     }

对于大多数URL来说都可以正常工作,但是这个URL:

  

https://twitter.com/?lang=fr

由于这个原因,我得到了这个异常:

  

javax.xml.transform.TransformerException:索引-1超出范围.....

以下是我得到的堆栈跟踪的一部分:

javax.xml.transform.TransformerException: Index -1 out of bounds for length 128
at java.xml/com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:366)
at java.xml/com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:303)
at java.xml/com.sun.org.apache.xpath.internal.jaxp.XPathImplUtil.eval(XPathImplUtil.java:101)
at java.xml/com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.eval(XPathExpressionImpl.java:80)
at java.xml/com.sun.org.apache.xpath.internal.jaxp.XPathExpressionImpl.evaluate(XPathExpressionImpl.java:89)
at files.ExampleCode.GetThoselinks(ExampleCode.java:50)
at files.ExampleCode.DoSomething(ExampleCode.java:113)
at files.ExampleCode.GetThoselinks(ExampleCode.java:81)
at files.ExampleCode.DoSomething(ExampleCode.java:113)

我不确定问题是否出在转换后的网站的xhtml或其他问题上。谁能说出代码中的错误吗?任何编辑都会有帮助。

1 个答案:

答案 0 :(得分:0)

我通常会说,来自XPath引擎深处的边界索引异常是XPath引擎中的错误。唯一的警告是XPath引擎正在搜索的DOM在结构上是否有问题; XPath处理器有权合理假设DOM是有效的,否则无效。在那种情况下,这将是Tidy的一个错误,该错误创建了DOM。