如何在谷歌上使用htmlunit获取“下一页”

时间:2012-02-17 18:07:14

标签: htmlunit next

我使用下面的代码来获取Google搜索结果的前两页 但我只能获取第一页(当搜索第2页时,它与第1页相同)

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;


/**
 * A simple Google search test using HtmlUnit.
 *
 * @author Rahul Poonekar
 * @since Apr 18, 2010
 */
public class Author_search {
    static final WebClient browser;

    static {
        browser = new WebClient();
        browser.setJavaScriptEnabled(false);
    }

    public static void main(String[] arguments) {
            searchTest();
    }

    private static void searchTest() {
        HtmlPage currentPage = null;

        try {
            currentPage = (HtmlPage) browser.getPage("http://www.google.com");
        } catch (Exception e) {
            System.out.println("Could not open browser window");
            e.printStackTrace();
        }
        System.out.println("Simulated browser opened.");

        try {
            ((HtmlTextInput) currentPage.getElementByName("q")).setValueAttribute("xxoo");
            currentPage = currentPage.getElementByName("btnG").click();
            System.out.println("contents: " + currentPage.asText());
            HtmlElement next = (HtmlElement)currentPage.getByXPath("//span[contains(text(), 'Next')]").get(0);
            currentPage = next.click();
            System.out.println("contents: " + currentPage.asText());
        } catch (Exception e) {
            System.out.println("Could not search");
            e.printStackTrace();
        }
    } 
}

任何人都可以告诉我如何解决这个问题吗?

顺便说一下:

  1. 如何使用htmlunit更改google中的语言设置?任何 方便吗?
  2. htmlunit是否将html视为“firebug” firefox,或者只是将其视为“file-> save”中的文本。在我的 意见,我相信它对待它就像是一个探险家,我是对的吗?

1 个答案:

答案 0 :(得分:2)

我已取代:

HtmlElement next = (HtmlElement)currentPage.getByXPath("//span[contains(text(),'Next')]").get(0);
currentPage = next.click();

<强>与

HtmlAnchor nextAnchor =currentPage.getAnchorByText("Next");
currentPage = nextAnchor.click();