Tuckey的UrlRewriteFilter无法过​​滤网站的根目录(对于抓取工具)?

时间:2018-01-13 19:14:53

标签: java web.xml servlet-filters tuckey-urlrewrite-filter

Google App Engine上的Java和AngularJS。

至于为什么,虽然我已经确信大多数抓取工具可以解析javascript网站,但它并没有完全解析我的angularjs网站,因此没有正确索引它。我已创建网站的静态版本,并希望根据用户代理有条件地重定向到该网站。它适用于除我的站点的根目录之外的每个URL,或者localhost:8080,有或没有尾部斜杠。

我认为是因为我的web.xml中tuckey UrlRewriteFilter的配置是/ *,所以没有尾随斜杠就不会触发它?不过,我试过改变它;我已经尝试了我能想到的一切,将servlet版本更改为3.0,使用" welcome-file",将空字符串替换为url-pattern等。

感谢您的帮助。

Urlrewrite.xml:

    <?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE urlrewrite PUBLIC "-//tuckey.org//DTD UrlRewrite 4.0//EN"
        "http://www.tuckey.org/res/dtds/urlrewrite4.0.dtd">

<urlrewrite use-query-string="true">

    <rule>
        <condition name="user-agent">
            facebookexternalhit/[0-9]|facebook|Googlebot|Googlebot-Mobile|
            Mediapartners-Google|AdsBot(.*)|AdSense(.*)|(.*)AdsBot|(.*)AdSense|
            Googlebot-Image|Googlebot-Video|Googlebot(.*)|
            FacebookExternalHit/[0-9]|Mediapartners-Google|AdsBot-Google
            |facebookexternalhit/1.0|FacebookExternalHit/1.1|
            FacebookExternalHit/1.0|facebookexternalhit/1.1|Facebot|Twitter|Twitterbot|Pinterest
        </condition>
        <from>^/(.*)$</from>
        <to>/staticview.jsp</to>
    </rule> 
</urlrewrite>

的web.xml:

<web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd">

  <filter>
      <filter-name>UrlRewriteFilter</filter-name>
      <filter-class>org.tuckey.web.filters.urlrewrite.UrlRewriteFilter</filter-class>
  </filter>
  <filter-mapping>
      <filter-name>UrlRewriteFilter</filter-name>
      <url-pattern>/*</url-pattern>
  </filter-mapping>

  <filter-mapping>
    <filter-name>appstats</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>
  <filter>
    <filter-name>appstats</filter-name>
    <filter-class>com.google.appengine.tools.appstats.AppstatsFilter</filter-class>
    <init-param>
      <param-name>calculateRpcCosts</param-name>
      <param-value>true</param-value>
    </init-param>
  </filter>
  <servlet>
    <servlet-name>appstats</servlet-name>
    <servlet-class>com.google.appengine.tools.appstats.AppstatsServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>appstats</servlet-name>
    <url-pattern>/appstats/*</url-pattern>
  </servlet-mapping>
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>appstats</web-resource-name>
      <url-pattern>/appstats/*</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>admin</role-name>
    </auth-constraint>
  </security-constraint>

  <servlet>
    <servlet-name>rss</servlet-name>
    <servlet-class>com.byron.common.controller.RSSServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>rss</servlet-name>
    <url-pattern>/rss</url-pattern>
  </servlet-mapping>
  <servlet>
    <servlet-name>rssfull</servlet-name>
    <servlet-class>com.byron.common.controller.FullRSSServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>rssfull</servlet-name>
    <url-pattern>/rssfull</url-pattern>
  </servlet-mapping>
  <servlet>
    <servlet-name>sitemap</servlet-name>
    <servlet-class>com.byron.common.controller.SitemapServlet</servlet-class>
  </servlet>
  <servlet-mapping>
    <servlet-name>sitemap</servlet-name>
    <url-pattern>/sitemap</url-pattern>
  </servlet-mapping>
  <servlet>
    <servlet-name>Jersey REST Service</servlet-name>
    <servlet-class>com.sun.jersey.spi.container.servlet.ServletContainer</servlet-class>
    <init-param>
      <param-name>com.sun.jersey.config.feature.DisableWADL</param-name>
      <param-value>true</param-value>
    </init-param>
    <!--
    Please try to declare your resource classes statically in your Application implementation as
    follows in order to minimize the startup time of your application.
    -->
    <init-param>
      <param-name>javax.ws.rs.Application</param-name>
      <param-value>com.byron.common.controller.Resources</param-value>
    </init-param>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>Jersey REST Service</servlet-name>
    <url-pattern>/rest/*</url-pattern>
  </servlet-mapping>
</web-app>

1 个答案:

答案 0 :(得分:0)

尝试为root创建显式规则映射,如下所示:

<rule>
    <from>^\/?.*$</from>
    <to >[your mapping goes here]</to>
</rule>

(此规则假设您使用的是regexps,而不是通配符)

我在我的应用程序中有它,并且它捕获localhost:8080调用