HTMLUnit和AppEngine

时间:2015-05-04 16:20:28

标签: java google-app-engine seo htmlunit

所以我正在尝试解析服务器端的javascript生成资源以进行SEO优化。我在www.ogle提供here的基于java的服务器上关注使用 HTMLUnit 的示例。

我们目前使用app-engine托管,但我在调用时找到了

final WebClient webClient = new WebClient();

我总是收到这个例外,任何人都有任何想法:

java.lang.ArrayStoreException: com.gargoylesoftware.htmlunit.httpclient.HtmlUnitDomainHandler
    at com.gargoylesoftware.htmlunit.httpclient.HtmlUnitBrowserCompatCookieSpec.<init>(HtmlUnitBrowserCompatCookieSpec.java:101)
    at com.gargoylesoftware.htmlunit.CookieManager.<init>(CookieManager.java:56)
    at com.gargoylesoftware.htmlunit.WebClient.<init>(WebClient.java:141)
    at com.gargoylesoftware.htmlunit.WebClient.<init>(WebClient.java:202)
    at filters.CrawlServlet.doFilter(CrawlServlet.java:38)

2 个答案:

答案 0 :(得分:3)

我使用HtmlUnit 2.16和AppEngine进行了测试,它的工作正常here

使用示例项目,将2.16罐复制到war / WEB-INF / lib,并具有:

@SuppressWarnings("serial")
public class GuestbookServlet extends HttpServlet {
    public void doGet(HttpServletRequest req, HttpServletResponse resp)
            throws IOException {
        resp.setContentType("text/plain");
        try (WebClient webClient = new WebClient()) {
            final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
            resp.getWriter().println(page.getTitleText());           
        }
    }
}

答案 1 :(得分:1)

这应该是httpclient版本依赖性问题,HTMLUnit 2.16您应该使用httpclient 4.4.1