为什么HTMLUnit总是显示HostPage,无论我输入什么URL(Crawlable GWT APP)?

时间:2014-05-16 14:15:28

标签: java gwt htmlunit gwtp

这是完整的代码

public class CrawlServlet implements Filter{
 public static String getFullURL(HttpServletRequest request) {
    StringBuffer requestURL = request.getRequestURL();
    String queryString = request.getQueryString();


    if (queryString == null) {
        return requestURL.toString();
    } else {
        return requestURL.append('?').append(queryString).toString();
    }
 }

 @Override
 public void destroy() {
 // TODO Auto-generated method stub

 }

 @Override
 public void doFilter(ServletRequest request, ServletResponse response,
 FilterChain chain) throws IOException, ServletException {

 HttpServletRequest httpRequest = (HttpServletRequest) request;
 String fullURLQueryString = getFullURL(httpRequest);
 System.out.println(fullURLQueryString+" what wrong");

 if ((fullURLQueryString != null) && (fullURLQueryString.contains("_escaped_fragment_"))) {
     // remember to unescape any %XX characters
     fullURLQueryString=URLDecoder.decode(fullURLQueryString,"UTF-8");
     // rewrite the URL back to the original #! version
         String url_with_hash_fragment=fullURLQueryString.replace("?_escaped_fragment_=", "#!");


         final WebClient webClient = new WebClient();

         WebClientOptions options = webClient.getOptions();
         options.setCssEnabled(false);
         options.setThrowExceptionOnScriptError(false);
         options.setThrowExceptionOnFailingStatusCode(false);
         options.setJavaScriptEnabled(false);
         HtmlPage page = webClient.getPage(url_with_hash_fragment);

         // important!  Give the headless browser enough time to execute JavaScript
         // The exact time to wait may depend on your application.

         webClient.waitForBackgroundJavaScript(20000);

         // return the snapshot
         //String originalHtml=page.getWebResponse().getContentAsString();
         //System.out.println(originalHtml+" +++++++++");
         System.out.println(page.asXml()+" +++++++++");

         PrintWriter out = response.getWriter();
         out.println(page.asXml());
         //out.println(originalHtml);
     } else {
      try {
        // not an _escaped_fragment_ URL, so move up the chain of servlet (filters)
        chain.doFilter(request, response);
      } catch (ServletException e) {
        System.err.println("Servlet exception caught: " + e);
        e.printStackTrace();
      }
    }

 }


 @Override
 public void init(FilterConfig arg0) throws ServletException {
 // TODO Auto-generated method stub

 }


}

打开网址“http://127.0.0.1:8888/Myproject.html?gwt.codesvr=127.0.0.1:9997?_escaped_fragment_=article”后,它显示主机页面html代码如下:

<html>

<head>
<meta name="fragment" content="!">
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
<!-- -->
<!--
 Consider inlining CSS to reduce the number of requested files 
-->
<!-- -->
<link type="text/css" rel="stylesheet" href="MyProject.css"/>
<!-- -->
<!-- Any title is fine -->
<!-- -->
<title>MyProject</title>
<!-- -->
<!-- This script loads your compiled module. -->
<!-- If you add any GWT meta tags, they must -->
<!-- be added before this line. -->
<!-- -->
<script type="text/javascript" language="javascript" ></script>
<!-- -->
<!-- The body can have arbitrary html, or -->
<!-- you can leave the body empty if you want -->
<!-- to create a completely dynamic UI. -->
<!-- -->
</head>
<body>

<div id="loading">
Loading
<br/>
<img src="../images/loading.gif"/>
</div>
<!-- OPTIONAL: include this if you want history support -->
<iframe src="javascript:''" id="__gwt_historyFrame" tabindex="-1" style="position: absolute; width: 0;height: 0; border:0;"></iframe>
<!--
 RECOMMENDED if your web app will not function without JavaScript enabled 
-->
<noscript>

<div style="width: 22em; position: absolute; left: 50%; margin-left: -11em; color: red; background-color: white; border: 1pxsolid red; padding: 4px; font-family: sans-serif;">
Your web browser must have JavaScript enabled in order for this application to display correctly.
</div>
</noscript>
</body>
</html>

另一方面,“http://127.0.0.1:8888/Myproject.html?gwt.codesvr=127.0.0.1:9997#!article”工作正常&amp;显示文章没有任何问题。

我还编制了整个项目&amp;在Tomcat7下运行它,但我有同样的问题。它总是显示主页的html。

注意:文章页面是嵌入在标题展示者中的嵌套演示者。但我不认为这是因为它甚至没有显示标题页的主要原因。

1 个答案:

答案 0 :(得分:0)

首先,代替?_escaped_fragment_=article,或许尝试&_escaped_fragment_=article,因为?已经有gwt.codesvr,因此2 ?可能会破坏网址参数解析

其次,您需要确保过滤器处理具有参数gwt.codesvr的情况。看起来您的过滤器假定它是第一个参数 - 即以?开头。我相信示例here无论如何都可以。