我正在尝试获取manta.com的html内容: 这是代码:
private static final String BROWSER = " Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0";
private static final int TIMEOUT = 13_000;
private static final String Accept_Value = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
private static final String Accept_Encoding_Value = "gzip, deflate";
public static void main(String[] args) throws IOException {
getRestPageUrlsInPage("http://www.manta.com/search?search=restaurants&pg=3&pt=34.0396,-118.2661&search_location=Los%20Angeles%20CA");
}
public static List<String> getRestPageUrlsInPage(String pageUrl) throws IOException {
List<String> restPageUrlsInPage = new ArrayList<>();
Response response = Jsoup.connect(pageUrl).userAgent(BROWSER)
.execute();
Document docOfPage = Jsoup.connect(pageUrl).ignoreContentType(true)
.userAgent(BROWSER).timeout(TIMEOUT)
.header("Accept", Accept_Value)
.header("Accept-Encoding", Accept_Encoding_Value)
.cookies(response.cookies())
.get();
Elements el = docOfPage.select("a.media-heading");
for (Element element : el) {
System.out.println(element);
}
return restPageUrlsInPage;
}
因此,当我运行时,它没有获取此网址的浏览器中的内容 - http://www.manta.com/search?search=restaurants&pg=3&pt=34.0396,-118.2661&search_location=Los%20Angeles%20CA
我知道然后必须发送标题,但它也不起作用或我做错了什么。 那我怎么解决这个问题呢?
预感谢。