尝试使用HtmlUnit浏览网站

时间:2014-02-26 22:04:26

标签: java html parsing web-scraping htmlunit

我正在尝试使用HtmlUnit登录约会网站http://www.pof.com,并浏览其网页。我无法成功完成第一步(登录)。我是初学者,请查看下面的代码。可能是网站限制机器人本身?

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlForm;
import com.gargoylesoftware.htmlunit.html.HtmlTextInput;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;

public class DataWebBot {


public static void main(String[] args) throws IOException {

   // navigate & fetch HTML        
   final WebClient webClient = new WebClient();

   //Get first page
   final HtmlPage pofLoginPage = webClient.getPage("http://www.pof.com");

   //Get the login form and find the user/password fields and submit button. 
   final HtmlForm loginForm = pofLoginPage.getFormByName("frmLogin");
   final HtmlTextInput usernameField = loginForm.getInputByName("username");
   final HtmlTextInput passwordField = loginForm.getInputByName("password");
   final HtmlSubmitInput loginButton = loginForm.getInputByName("submitbutton");

   //Set value of text field, login and get the second page
   usernameField.setValueAttribute("myUsername");
   passwordField.setValueAttribute("myPassword");

   webClient.getCookieManager().getCookies();

   System.out.println("Complete");
  }
}

我收到了这些错误:

Feb 26, 2014 4:59:38 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
Feb 26, 2014 4:59:38 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.pof.com/javascript/versioned/pofcommon.min.1386808612.js] line=[5] lineSource=[null] lineOffset=[0]
Feb 26, 2014 4:59:38 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
Feb 26, 2014 4:59:38 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.pof.com/javascript/versioned/pofcommon.min.1386808612.js] line=[5] lineSource=[null] lineOffset=[0]
Feb 26, 2014 4:59:38 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
Feb 26, 2014 4:59:38 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.pof.com/javascript/versioned/pofcommon.min.1386808612.js] line=[5] lineSource=[null] lineOffset=[0]
Feb 26, 2014 4:59:38 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
Feb 26, 2014 4:59:38 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.pof.com/javascript/versioned/pofcommon.min.1386808612.js] line=[5] lineSource=[null] lineOffset=[0]
Feb 26, 2014 4:59:39 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:769] Error in expression. (Invalid token "}". Was expecting one of: <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <HASH>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <PERCENTAGE>, <DIMENSION>, <URI>, <FUNCTION>, "-".)
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:8694] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:8694] Ignoring the following declarations in this rule.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:8899] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:8899] Ignoring the following declarations in this rule.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:9105] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:9105] Ignoring the following declarations in this rule.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:9310] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:9310] Ignoring the following declarations in this rule.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:9544] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:9544] Ignoring the following declarations in this rule.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:12238] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <PERCENTAGE>, <DIMENSION>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:12238] Ignoring the following declarations in this rule.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:16850] Error in style rule. (Invalid token "*". Was expecting one of: <EOF>, <S>, <IDENT>, "}", ";".)
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:16850] Ignoring the following declarations in this rule.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:19023] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <PERCENTAGE>, <DIMENSION>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://www.pof.com/css/versioned/main.min.1392865551.css' [1:19023] Ignoring the following declarations in this rule.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
Feb 26, 2014 4:59:40 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Feb 26, 2014 4:59:41 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.html.HtmlPasswordInput cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlTextInput
    at DataWebBot.main(DataWebBot.java:28)

问题是什么?我错误地使用了cookie吗?我需要他们继续浏览网站的其他年龄,不是吗?请帮忙。谢谢!

2 个答案:

答案 0 :(得分:0)

服务器上缺少JavaScript文件?

http://www.pof.com/javascript/versioned/1386808612.js

答案 1 :(得分:0)

错误似乎很清楚:

Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.html.HtmlPasswordInput cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlTextInput

所以改变这个:

final HtmlTextInput passwordField = loginForm.getInputByName("password");

进入这个:

final HtmlPasswordInput passwordField = loginForm.getInputByName("password");

这会让你通过例外。作为旁注,您可能会面临JavaScript问题,但这是另一回事。