使用Java和XPath提取HTML元素

时间:2015-06-29 15:52:59

标签: java html xml xpath web-scraping

我正在尝试提取地址的纬度和经度。这是编码。

public static void main(String[] args) throws Exception {
   int responseCode = 0;
   String api = "http://maps.googleapis.com/maps/api/geocode/xml?address=9%20Edinburgh%20Place,%20Centrall&sensor=false&components=country:HK&language=en";         

   URL url = new URL(api);

   HttpURLConnection httpConnection = (HttpURLConnection)url.openConnection();
   httpConnection.connect();
   responseCode = httpConnection.getResponseCode();
   if(responseCode == 200) {
       DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();;
       Document document = builder.parse(httpConnection.getInputStream());
       XPathFactory xPathfactory = XPathFactory.newInstance();
       XPath xpath = xPathfactory.newXPath();      
       XPathExpression expr = xpath.compile("/GeocodeResponse/status");
       String status = (String)expr.evaluate(document, XPathConstants.STRING);
      if(status.equals("OK")) {     
          expr = xpath.compile("//*[@id=\"collapsible6\"]/div[1]/div[2]/div[1]/span[2]");
          Object results = expr.evaluate(document, XPathConstants.NODESET);
          NodeList nodes = (NodeList) results; 
          System.out.println(nodes.getLength());

          for (int i = 0; i < nodes.getLength(); i++){
             System.out.println("latitude: " + nodes.item(i).getNodeValue()); 
          }

          expr = xpath.compile("//geometry/location/lng");
          String lng = (String)expr.evaluate(document, XPathConstants.STRING);
          System.out.println("longitude: " + lng);
      } else      
          throw new Exception("Error from the API - response status: "+status);       
  }
}}

我通过检查web元素来复制xpath,并尝试将其实现为纬度,但它一直为node.getLength()提供0。但是,它适用于经度。 如果我想保留HTML元素并在XPath中使用它,代码应该如何改变?

1 个答案:

答案 0 :(得分:0)

看起来您的代码中存在错误:

expr = xpath.compile("//*[@id=\"collapsible6\"]/div[1]/div[2]/div[1]/span[2]");

不应该是这个吗?

expr = xpath.compile("//geometry/location/lat");