我正在尝试使用正则表达式来指示提供的URL是否是站点的索引页面。这意味着它必须匹配domain.com,domain.com和domain.com/index.php但不匹配domain.com/page.php
这是我提出的用于测试的列表。由于www / nonwww,http / https,尾随斜杠等原因导致的许多排列
它应符合这些:
它不应该匹配这些
(我还有其他任何组合吗?)
到目前为止,我想出的只是:
site.com(/ |的index.php |)
这显然不正确,因为它也匹配/页面值。
答案 0 :(得分:7)
有效
^https?://[^/]+(/(\?.*|index\.php(\?.*)?)?)?$
注意这是一个通用的正则表达式。为了配合你的口味,你可能需要逃避。
使用egrep
运行简单测试后,结果为
$ while read x
> do
> if echo $x | egrep '^https?://[^/]+(/(\?.*|index\.php(\?.*)?)?)?$' > /dev/null
> then
> echo MATCH $x
> else
> echo NOT MATCH $x
> fi
> done < data
MATCH http://site.com/index.php
MATCH http://site.com/
MATCH http://site.com
MATCH http://site.com/index.php?var=X
MATCH http://site.com/?var=X
MATCH http://site.com?var=X
MATCH https://site.com/index.php
MATCH https://site.com/
MATCH https://site.com
MATCH https://site.com/index.php?var=X
MATCH https://site.com/?var=X
MATCH https://site.com?var=X
MATCH http://www.site.com/index.php
MATCH http://www.site.com/
MATCH http://www.site.com
MATCH http://www.site.com/index.php?var=X
MATCH http://www.site.com/?var=X
MATCH http://www.site.com?var=X
MATCH https://www.site.com/index.php
MATCH https://www.site.com/
MATCH https://www.site.com
MATCH https://www.site.com/index.php?var=X
MATCH https://www.site.com/?var=X
MATCH https://www.site.com?var=X
NOT MATCH http://site.com/page.php
NOT MATCH http://site.com/page.php?var=X
NOT MATCH http://site.com/page
NOT MATCH http://site.com/page/
NOT MATCH http://site.com/page/index.php
NOT MATCH http://site.com/page?var=X
NOT MATCH http://site.com/page/?var=X
NOT MATCH https://site.com/page.php
NOT MATCH https://site.com/page.php?var=X
NOT MATCH https://site.com/page
NOT MATCH https://site.com/page/
NOT MATCH https://site.com/page/index.php
NOT MATCH https://site.com/page?var=X
NOT MATCH https://site.com/page/?var=X
NOT MATCH http://www.site.com/page.php
NOT MATCH http://www.site.com/page.php?var=X
NOT MATCH http://www.site.com/page
NOT MATCH http://www.site.com/page/
NOT MATCH http://www.site.com/page/index.php
NOT MATCH http://www.site.com/page?var=X
NOT MATCH http://www.site.com/page/?var=X
NOT MATCH https://www.site.com/page.php
NOT MATCH https://www.site.com/page.php?var=X
NOT MATCH https://www.site.com/page
NOT MATCH https://www.site.com/page/
NOT MATCH https://www.site.com/page/index.php
NOT MATCH https://www.site.com/page?var=X
NOT MATCH https://www.site.com/page/?var=X
答案 1 :(得分:0)
假设你在PHP中这样做。您应该使用parse_url()
(http://php.net/manual/en/function.parse-url.php),然后查看路径元素。
<?php
$url = "http://example.com/index.php?page=1";
$path = parse_url($url, PHP_URL_PATH);
print "path=$path\n";
?>
运行它,你得到
path=/index.php
只有$path
中的路径,只需匹配/
或/index.php
或其他任何内容。没有必要的正则表达式。