在谷歌学者中点击使用HtmlUnit的链接

时间:2015-12-06 20:47:22

标签: java htmlunit

我正在使用HtmlUnit并尝试在google scholar中搜索,然后通过执行以下操作获取bibtex项目:

1.go to google scholar。

2.键入我要搜索的文件的名称。

3.点击“引用”链接,然后会出现一个小方框。

4.在小盒子里,我想按“导入到bibtex”并获取文本。

例如,您可以查看此页面并尝试:https://scholar.google.com/scholar?q=internet+of+things+for+smart+cities&btnG=&hl=en&as_sdt=0%2C5

我能够访问搜索页面但我无法完成其他步骤。 这是我的代码

WebClient webClient = new WebClient(BrowserVersion.CHROME);
    HtmlPage page = webClient.getPage("https://scholar.google.com/");

    HtmlInput searchBox = page.getElementByName("q");
    searchBox.setValueAttribute("internet of things for smart cities");


    HtmlButton googleSearchSubmitButton = page.getElementByName("btnG");
    page = googleSearchSubmitButton.click();

    HtmlAnchor anchor = page.getAnchorByName("Cite");
    page = anchor.click();

    System.out.println(page.asText());

    webClient.close();

任何帮助?

1 个答案:

答案 0 :(得分:1)

这是您尝试做的事情的开始:

    WebClient webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);

    HtmlPage page = webClient.getPage("https://scholar.google.com/");

    HtmlInput searchBox = (HtmlInput) page.getElementById("gs_hp_tsi");
    searchBox.setValueAttribute("internet of things for smart cities");

    HtmlButton googleSearchSubmitButton = page.getElementByName("btnG");
    page = googleSearchSubmitButton.click();

    HtmlAnchor anchor = page.getAnchorByText("Cite");
    anchor.click();

    webClient.waitForBackgroundJavaScript(5000);

    HtmlAnchor linkBibTex = page.getAnchorByText("BibTeX");

    TextPage neededPage = linkBibTex.click();

    System.out.println(neededPage.getContent());

    webClient.close();