多个if /或规则的htaccess规则,包括用户代理,cookie,uri和文件名

时间:2013-01-02 13:34:47

标签: apache .htaccess mod-rewrite

我遵守.htaccess规则。我需要为这个块添加一些规则。我不想失去旧的。

<FilesMatch "\.(htaccess|htpasswd|ini|phps|fla|psd|log|sh)$">
Order allow,Deny
Deny from all
</FilesMatch>

<IfModule mod_rewrite.c>
    RewriteEngine On

    RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
    RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(.*)$ index.php [QSA,L]
</IfModule>

我的规则是这样的:

- if HTTP_USER_AGENT includes BotOne
- or HTTP_USER_AGENT includes OtherBot
- or HTTP_COOKIE user_id != 1

    - if REQUEST_URI is "/" main directory
    - or REQUEST_FILENAME includes "utm_source"
    - or REQUEST_FILENAME includes "utm_medium"
    - or REQUEST_FILENAME includes "utm_campaign" and "utm_content"

        - if REQUEST_FILENAME doesn't include "/blog/"
        - or REQUEST_FILENAME doesn't include "gif"
        - or REQUEST_FILENAME doesn't include "jpg"

            - then RewriteRule all files to index.html

我试过这个。但没有帮助。我该如何编写这些规则?

<IfModule mod_rewrite.c>
    RewriteEngine On

    RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
    RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(.*)$ index.php [QSA,L]

    RewriteCond %{HTTP_USER_AGENT} "BotOne|OtherBot" [NC,OR]
    RewriteCond %{HTTP_COOKIE} !^.*user_id=1   [NC]
    #
    RewriteCond %{REQUEST_URI} \/  [NC,OR]
    RewriteCond %{REQUEST_FILENAME} ^utm_source.*  [NC,OR]
    RewriteCond %{REQUEST_FILENAME} ^utm_medium.*  [NC,OR]
    RewriteCond %{REQUEST_FILENAME} ^utm_campaign.*  [NC,OR]
    RewriteCond %{REQUEST_FILENAME} ^utm_content.*  [NC]
    #
    RewriteCond %{REQUEST_FILENAME} !\/blog\/.*  [NC,OR]
    RewriteCond %{REQUEST_FILENAME} !gif.*  [NC,OR]
    RewriteCond %{REQUEST_FILENAME} !jpg.*  [NC]
    RewriteRule ^.*? index.html [R=301,L]
</IfModule>

我要重定向的主要网址如下:
* http://example.com =&gt; http://example.com/index.html
* http://example.com/ =&gt; http://example.com/index.html
* http://example.com/?utm_source=michael =&gt; http://example.com/index.html
* http://example.com/?utm_medium=twitter =&gt; http://example.com/index.html
* http://example.com/?utm_campaign=camp2&utm_content=somewhere =&gt; http://example.com/index.html
* http://example.com/blog/ * =&gt;没有重定向
* http://example.com/myfile.jpg =&gt;没有重定向
* http://example.com/myfile.gif =&gt;没有重定向

如果(用户代理是“BotOne”)或(用户代理是“OtherBot”)或(他/她的Cookie user_id不是1),将触发此重定向。

将删除任何查询参数。

1 个答案:

答案 0 :(得分:0)

在.htaccess中处理规则的方式,根本就没有办法用某种类型的构造或解析来表达它,就像你在编程语言中这样做一样。在过去,我遇到了类似的问题,并且在得到一个完整的答案时遇到了很多困难,当我最终为自己写下来时,我将来可以再次找到它。这是我写给自己的:

## After quite a bit of puzzlement and seemingly maddeningly
##  vague documentation, I finally figured out exactly how mod_rewrite's
##  [OR] flag really works: In mod_rewrite there's not really any
##  "precendence"; RewriteCond's are simply processed sequentially.
##  Without any modification, the default is to AND _everything_.
##  Including the [OR] modifier on some RewriteCond's creates a
##  two-level expression with only ANDs at the outer/upper level and
##  only ORs at the inner/lower level. Thus
##  RewriteCond a [OR]
##  RewriteCond b
##  RewriteCond c [OR]
##  RewriteCond d
##  RewriteCond e [OR]
##  RewriteCond f [OR]
##  RewriteCond g
##  is equivalent to the boolean expression
##  ((a OR b) AND (c OR d) AND (e OR f OR g))
## There's _no_ way to have ANDs at the _lower/inner_ level and ORs
##  at the _upper/outer_ level; such constructs can only be implemented with
##  either multiple rulesets (and unavoidable duplication), or the
##  introduction of intermediate environment variables.
## Thus the only advantages of [OR] over a | in an RE are increased
##  clarity/maintainability, and the possibility of checking against
##  unrelated variables. REs with lots of |, on the other hand, are
##  assumed to be much faster.

如果我正确理解你的需要,整个事情可以被认为是一个巨大的条件,块不是通过附属的'if'子句而是通过AND连接,就像这样:

IF

((- HTTP_USER_AGENT includes BotOne
- or HTTP_USER_AGENT includes OtherBot
- or HTTP_COOKIE user_id != 1)
AND
(- REQUEST_URI is "/" main directory
- or REQUEST_FILENAME includes "utm_source"
- or REQUEST_FILENAME includes "utm_medium"
- or REQUEST_FILENAME includes "utm_campaign" and "utm_content")
AND
(- REQUEST_FILENAME doesn't include "/blog/"
- or REQUEST_FILENAME doesn't include "gif"
- or REQUEST_FILENAME doesn't include "jpg"))

THEN

- RewriteRule all files to index.html

我看到的最大的复杂因素是关于“utm_campaign”和“utm_content”的规则,因为据我所知,正则表达式(甚至复杂的PERL风格的那些,如.htaccess中的那些)不处理未指定的顺序好吧。如果您知道字符串实际上总是处于相同的顺序,那么您可以编写类似“utm_campaign。* utm_content”的RE。如果订单确实未指定,为了完全符合您的规范,您需要两个规则条件,每个可能的订单一个,如下所示:

RewriteCond "utm_campaign.*utm_content" [OR]
RewriteCond "utm_content.*utm_campaign"

在我看来,你的一些RE并没有表达你的伪规则实际上所说的完全相同的东西。例如:

REQUEST_FILENAME includes "utm_source"

应该成为

RewriteCond ${REQUEST_FILENAME} utm_source

,因为

RewriteCond ${REQUEST_FILENAME} ^utm_source 

实际上实现了

REQUEST_FILENAME **startswith** utm_source

另外,我允许奇怪的浏览器发送根本没有任何东西,如下所示(也注意没有单独的大写和小写版本的'/',所以[NC]只是给你一个轻微的性能打击无缘无故)。请注意,你需要开始('^')和结束('$')字符串锚点,否则你会匹配像“/ xxx / yyy / zzz”这样的东西,因为它们包含斜杠:

RewriteCond ${REQUEST_URI} ^/?$ [OR]

最后,只匹配你关心的字符串部分;没有必要匹配字符串的其余部分(事实上,尝试匹配字符串的其余部分通常会导致奇怪的不必要的错误)。换句话说,.htaccess RE中“。*”的存在通常表示某种不必要的怪异,最多只能咬掉一些性能,最坏的情况是掩盖一些错误。而不是说“utm_source。*”只是说“utm_source”。

乍一看,你的多条件逻辑对我来说是正确的(幸运的是,因为有很多方法可以获得像这些混乱的复杂条件)。因此,如果它不起作用,我会怀疑规则的其他问题(尤其是正则表达式)而不是逻辑/优先级错误。 (另外,我的猜测是问题有几个不同的原因,而不仅仅是一个共同的根本原因,所以解决一个问题也不太可能解决所有其他问题。)

你能给我们一个输入字符串的具体例子,你期望发生什么,以及实际发生了什么?