HtmlUnit评论出facebook页面的行

时间:2014-01-25 02:30:53

标签: java jquery facebook xpath htmlunit

我正在尝试使用HtmlUnit模拟我的Facebook页面的登录过程(我确实有充分的理由做同样的事情)。这是我的相同的java代码:

public static void main(String[] args) throws IOException {
//tried to experiment with the browser types also. But to the same result
//even using no param constructor does not help.
        WebClient webClient=new WebClient(BrowserVersion.CHROME);

        HtmlPage page1=webClient.getPage("https://www.facebook.com/bhramakarserver");
        HtmlForm loginForm=(HtmlForm)page1.getElementById("login_form");
        HtmlTextInput username=(HtmlTextInput)page1.getElementById("email");
        HtmlPasswordInput password=(HtmlPasswordInput)page1.getElementById("pass");
        username.setValueAttribute("myFbUsername");
        password.setValueAttribute("myFbPassword");
        HtmlElement button = (HtmlElement) page1.createElement("button");
        button.setAttribute("type", "submit");

        // append the button to the form
        loginForm.appendChild(button);
        page1=button.click();

        //page1.executeJavaScript("window.scrollBy(0,6000)"); does not work
        System.out.println(page1.asXml());
        HtmlSpan postContentSpan=(HtmlSpan)page1.getByXPath("//span[@class='userContent']").get(0);
        System.out.println(postContentSpan.asXml());
    }

当我运行它时,我收到以下错误:

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
    at java.util.ArrayList.rangeCheck(ArrayList.java:604)
    at java.util.ArrayList.get(ArrayList.java:382)
    at com.rahulserver.fbhighlight.Main.main(Main.java:35)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)

很明显,致病系是

HtmlSpan postContentSpan=(HtmlSpan)page1.getByXPath("//span[@class='userContent']").get(0);

xpath返回null。我发布了与它相关的this问题并得出了包含上述xpath的代码被注释掉的答案,因此返回null。

那么为什么会发生这种情况以及如何使其发挥作用?随着页面加载进一步向下滚动,就像通常的Facebook一样,我尝试使用

来模拟该过程
page1.executeJavaScript("window.scrollBy(0,6000)"); 

但它不起作用,我得到了相同的结果。这是生成的html文件的pastebin链接:http://pastebin.com/MfXsYSJQ

我相信SO上的某个人能够想出一个开箱即用的答案......

2 个答案:

答案 0 :(得分:0)

由于您正在使用的浏览器出现问题,因此需要添加AJAX支持和javascript等待。更改浏览器并需要添加更多行,如下所示:

WebClient webClient=new WebClient(BrowserVersion.FIREFOX_3_6);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.waitForBackgroundJavaScript(50000);

不推荐使用FireFox 3.6,但是应用程序运行时效果会更好。

如果符合你的要求,请随意选择正确的答案。

答案 1 :(得分:0)

以下代码在我的系统上运行。请找到代码

import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlPasswordInput;
import com.gargoylesoftware.htmlunit.html.HtmlSpan;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
import java.io.IOException;

public class App {

   public static void main(String[] args) throws IOException {

       WebClient webClient=new WebClient(BrowserVersion.FIREFOX_3_6);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.waitForBackgroundJavaScript(50000);
        HtmlPage page1=webClient.getPage("https://www.facebook.com/bhramakarserver");
        HtmlForm loginForm=(HtmlForm)page1.getElementById("login_form");
        HtmlTextInput username=(HtmlTextInput)page1.getElementById("email");
        HtmlPasswordInput password=(HtmlPasswordInput)page1.getElementById("pass");
        username.setValueAttribute("username");
        password.setValueAttribute("password");
        HtmlElement button = (HtmlElement) page1.createElement("button");
        button.setAttribute("type", "submit");

        // append the button to the form
        loginForm.appendChild(button);
        page1=button.click();

        HtmlSpan postContentSpan=(HtmlSpan)page1.getByXPath("//span[@class='userContent']").get(0);
        System.out.println("The content is "+postContentSpan.asXml());
    }
}