在Jsoup中获取响应代码时,无法将HttpConnection强制转换为Connection $ Response

时间:2015-12-15 10:21:22

标签: java url jsoup

我使用Jsoup API 1.8.3来解析使用PHP生成的网站中存在的所有链接。主页,页面等联系表单已成功解析。但是对于登录页面,它由于以下原因而失败:

  

获取URL时出现HTTP错误。状态= 404,   https://.../info/en/loginMf.php?src=trading

失败是因为它需要有效的凭据。因此,我想跳过这样的URL。 我试图通过使用以下方法检查状态代码来完成它:

Connection.Response response=(Response) Jsoup.connect(path);//Added typecast
  System.out.println(response.statusCode());

但是这个添加的类型转换在运行时给出了错误:classCastException

在将网址命中状态代码传递给parse()方法之前获取状态代码的确切方法是什么?

修改

我试图采用@lonesome here给出的答案,如下所示:

        try
          {
            Connection.Response response= Jsoup.connect(path).execute();
            int statusCode=response.statusCode();
            if (statusCode <= 200 && statusCode < 300) {
                 doc = Jsoup.connect(filename).get();//web crawling
                  }

          } 
                 catch(HttpStatusException http)
                 {
                     System.out.println("Status:"+http.getStatusCode());
                     http.printStackTrace();
                 }

但问题是,int statusCode=response.statusCode();行没有被执行。这可能是因为jsoup的工作方式。需要执行以回复@lucksch回答的响应。

3 个答案:

答案 0 :(得分:2)

试试这个:

                  HttpURLConnection httpConn;
                  URL url = new URL("adr");

                  URLConnection connection = url.openConnection();
                  int statusCode = httpConn.getResponseCode();

                   if (connection instanceof HttpURLConnection) {
                      try{
                          httpConn = (HttpURLConnection) connection;

                     if (statusCode <= 200 && statusCode < 300) {
                             // means the connection was successful
                             //do crawling
                       }
             }
     } 
                   catch (ConnectException ex) { java.util.logging.Logger.getLogger(crawler.class.getName()).log(Level.SEVERE, null, ex);}  //catch the possible exception.
                   catch (SSLHandshakeException |SocketException | SocketTimeoutException | UnknownHostException ex) {java.util.logging.Logger.getLogger(crawler.class.getName()).log(Level.SEVERE, null, ex);
                   //replace crawler with the name of your program main class

答案 1 :(得分:1)

只有当您实际针对所需网站发出请求时,才会收到回复。所以这就是你得到它的方式:

Connection.Response response= Jsoup.connect(path).execute();

execute方法返回Connection.Response,其中包含状态代码。

答案 2 :(得分:0)

当返回不正常的HTTP响应时,JSoup会抛出HttpStatusException。这是一个演示程序,它将向您展示如何使用JSoup正确验证URL。我建立了一个网址列表,当然你已经从某个地方获得了这个列表。

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.jsoup.HttpStatusException;
import org.jsoup.Jsoup;

public class JSoupMain
{
    public static void main(String[] args)
    {
        List<String> allUrls = new ArrayList<String>();
        allUrls.add("http://en.wikipedia.org");
        allUrls.add("http://en.wikipedia.org/blah"); //<---This will cause a 404 status code to be returned
        allUrls.add("http://mvnrepository.com/artifact/org.jsoup/jsoup/1.8.3");

        System.out.println("Checking urls");
        List<String> goodUrls = getGoodUrls(allUrls);

        System.out.println("\r\nGood urls");
        for(String url : goodUrls)
        {
            System.out.println(url);
        }
    }

    private static List<String> getGoodUrls(List<String> allUrls)
    {
        List<String> goodUrls = new ArrayList<String>();
        for(String url : allUrls)
        {
            try
            {
                Jsoup.connect(url).get();
                goodUrls.add(url);
            }
            catch(HttpStatusException e)
            {
                System.out.println("Url " + url + " resulted in " + e.getStatusCode());
            }
            catch(IOException e)
            {
                e.printStackTrace();
            }
        }
        return goodUrls;
    }
}