Question

我遇到了一个无法使用JSoup阅读源代码的问题。我是一家经营代理公司的公司。

我正在使用代理发送身份验证，似乎工作正常，但我没有收到任何代码。我得到的所有信息都是：

此服务器收到的原始HTTP请求：GET http://www.google.com/search?q=test HTTP / 1.1 Accept-Encoding：gzip User-Agent：Mozilla / 5.0（Windows NT 6.1; Win64; x64; rv：25.0） Gecko / 20100101 Firefox / 25.0 Referer：http://www.google.com主持人： www.google.com接受：text / html，image / gif，image / jpeg，*; q = .2， / ; q = .2代理连接：保持活动

无论我尝试访问哪个网址，我都会收到类似的消息，而我似乎无法恢复完整的代码。

public static void main(String[] args) throws IOException {
    System.setProperty("http.proxyHost", proxyHost);
    System.setProperty("http.proxyPort", proxyPort);
    System.setProperty("http.proxyUser", authUser);
    System.setProperty("http.proxyPassword", authPassword);

    String url = "http://www.google.com/search?q=test";
    Document document = null;
    try {
        document = Jsoup.connect(url)
                .userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
                .referrer("http://www.google.com")
                .timeout(1000*5)
                .get();
    } catch (NullPointerException e) {
        e.printStackTrace();
    } catch (HttpStatusException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    System.out.println(document);

    for (Element result : document.select("h3.r a")){
        final String title = result.text();
        final String hyper = result.attr("href");
        System.out.println(title + " -> " + hyper);
    }
}

*编辑我尝试过其他网站，只是检索一个类似的消息。

使用JSoup进行网页抓取时遇到问题

0 个答案: