Question

我试图删除显示某些搜索结果的页面。我已经制作了一个简单的刮刀并且它正在工作，但是当我尝试这个搜索页面时，应该显示搜索结果的地方，说＆＃34;我们找不到您请求的页面。如果您需要，请寻求帮助＆＃34;。如何废弃搜索到的网页，以便废弃的结果会随着时间的推移而有所不同？

我试过this google search url. 并this is the one我实际上想要废弃。这是抓取的代码。

    try
    {
        Connection connection = Jsoup.connect(url).userAgent(USER_AGENT);
        Document htmlDocument = connection.get();
        this.htmlDocument = htmlDocument;
        String qqq=htmlDocument.toString();
        System.out.println(qqq);
        if(connection.response().statusCode() == 200) // 200 is the HTTP OK status code
                                                      // indicating that everything is great.
        {
            System.out.println("\n**Visiting** Received web page at " + url);
        }
        if(!connection.response().contentType().contains("text/html"))
        {
            System.out.println("**Failure** Retrieved something other than HTML");
            return false;
        }

        Elements linksOnPage = htmlDocument.select("a[href]");
        System.out.println("Found (" + linksOnPage.size() + ") links");
        for(Element link : linksOnPage)
        {
            this.links.add(link.absUrl("href"));
            System.out.println(link.absUrl("href"));
        }
        return true;
    }
    catch(IOException ioe)
    {
        // We were not successful in our HTTP request
        return false;
    }

刮取搜索结果网页

0 个答案: