url上的多个斜杠:如何删除?

时间:2013-06-18 13:32:18

标签: .htaccess mod-rewrite

根据此处的代码:remove multiple trailing slashes mod_rewrite

我有以下htaccess

Options +FollowSymLinks
DirectorySlash Off
RewriteEngine on
RewriteOptions inherit
RewriteBase /

#
# remove multiple slashes from url
#
RewriteCond %{HTTP_HOST} !=""
RewriteCond %{THE_REQUEST} ^[A-Z]+\s//+(.*)\sHTTP/[0-9.]+$ [OR]
RewriteCond %{THE_REQUEST} ^[A-Z]+\s(.*/)/+\sHTTP/[0-9.]+$
RewriteRule .* http://%{HTTP_HOST}/%1 [R=301,L]

#
# Remove multiple slashes anywhere in URL
#
RewriteCond %{THE_REQUEST} ^(.*)//(.*)$
RewriteRule . %1/%2 [R=301,L]

然而我发现G-Bot抓取了这个网址:http://www.example.com/aaa/bbb/////////bbb-ccc/bbb-ddd.htm。 (aaa,bbb,ccc,ddd,是url中的关键字,不得采用litraly - 我会显示网址的模式)

通过实时服务器测试上面的url我发现斜杠删除不起作用。

任何人都可以提供现有代码的任何提示或改进吗?谢谢

编辑1
@Sylwester提供了以下代码

# if match set environment variable and start over
RewriteRule ^(.*?)//+(.*)$ $1/$2 [E=REDIR:1,N]

# if done at least one. redirect with 301
RewriteCond %{ENV:REDIR} 1
RewriteRule ^/(.*) /$1 [R=301,L]

它也不起作用。我仍然在网址中看到//////。我已将这套规则放在我的htaccess文件的顶部,就在" RewriteBase /",以便不受其他规则的影响,但......没有 还有其他建议吗?

1 个答案:

答案 0 :(得分:3)

每个目录和.htaccess都很棘手,因为apache实际上已经为我们删除了冗余。例如。不再匹配// +所以我们检查%{REQUEST_URI},因为它有原始URI,而重写规则需要匹配任何东西:

# NB: Only works for per directory and .htaccess
# Needs "AllowOverride All" in global config for .htaccess 
RewriteEngine On
RewriteBase "/"

Options +FollowSymlinks
# Check if the REQUEST_URI has redundant slashes
# and redirect to self if it has (which apache has cleaned up already)
RewriteCond %{REQUEST_URI} //+
RewriteRule ^(.*) $1 [R=301,L]   

如果你可以添加全局配置,我会更喜欢在虚拟主机中使用它:

RewriteEngine On
# if match set environment variable and start over
RewriteRule ^(.*?)//+(.*)$ $1/$2 [E=REDIR:1,N]

# if done at least one. redirect with 301
RewriteCond %{ENV:REDIR} 1
RewriteRule ^/(.*) /$1 [R=301,L]