Question

我正在尝试获取manta.com的html内容：这是代码：

    private static final String BROWSER = " Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0";
private static final int TIMEOUT = 13_000;
private static final String Accept_Value = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
private static final String Accept_Encoding_Value = "gzip, deflate";


public static void main(String[] args) throws IOException {
    getRestPageUrlsInPage("http://www.manta.com/search?search=restaurants&pg=3&pt=34.0396,-118.2661&search_location=Los%20Angeles%20CA");

}

public static List<String> getRestPageUrlsInPage(String pageUrl) throws IOException {

    List<String> restPageUrlsInPage = new ArrayList<>();
    Response response = Jsoup.connect(pageUrl).userAgent(BROWSER)

            .execute();

    Document docOfPage = Jsoup.connect(pageUrl).ignoreContentType(true)
            .userAgent(BROWSER).timeout(TIMEOUT)
            .header("Accept", Accept_Value)
            .header("Accept-Encoding", Accept_Encoding_Value)
            .cookies(response.cookies())
            .get();

    Elements el = docOfPage.select("a.media-heading");

    for (Element element : el) {
        System.out.println(element);
    }

    return restPageUrlsInPage;

}

因此，当我运行时，它没有获取此网址的浏览器中的内容 - http://www.manta.com/search?search=restaurants&pg=3&pt=34.0396,-118.2661&search_location=Los%20Angeles%20CA

我知道然后必须发送标题，但它也不起作用或我做错了什么。那我怎么解决这个问题呢？

预感谢。

Jsoup - 无法获得与浏览器相同的HTML内容

0 个答案: