我试图删除显示某些搜索结果的页面。我已经制作了一个简单的刮刀并且它正在工作,但是当我尝试这个搜索页面时,应该显示搜索结果的地方,说"我们找不到您请求的页面。如果您需要,请寻求帮助"。 如何废弃搜索到的网页,以便废弃的结果会随着时间的推移而有所不同?
我试过this google search url. 并this is the one我实际上想要废弃。 这是抓取的代码。
try
{
Connection connection = Jsoup.connect(url).userAgent(USER_AGENT);
Document htmlDocument = connection.get();
this.htmlDocument = htmlDocument;
String qqq=htmlDocument.toString();
System.out.println(qqq);
if(connection.response().statusCode() == 200) // 200 is the HTTP OK status code
// indicating that everything is great.
{
System.out.println("\n**Visiting** Received web page at " + url);
}
if(!connection.response().contentType().contains("text/html"))
{
System.out.println("**Failure** Retrieved something other than HTML");
return false;
}
Elements linksOnPage = htmlDocument.select("a[href]");
System.out.println("Found (" + linksOnPage.size() + ") links");
for(Element link : linksOnPage)
{
this.links.add(link.absUrl("href"));
System.out.println(link.absUrl("href"));
}
return true;
}
catch(IOException ioe)
{
// We were not successful in our HTTP request
return false;
}