我想从URL中删除片段(例如#foobar),但是基于某些规则。通常,残酷的正则表达式可以解决问题;
$url =~ s/#.+//;
但我希望它考虑几个方面,最值得注意的是这些转换
http://www.example.com/#/ => http://www.example.com/
http://www.example.com/#foo/bar#foo => http://www.example.com/#foo/bar
http://www.example.com/#foo?a=1 => http://www.example.com/#foo?a=1
http://www.example.com/#foo/?a=1 => http://www.example.com/#foo/?a=1
所以规则应该是:
1)如果是/#/,只需用/.
替换它2)如果#是不上游后跟一个/或?,请删除它。
任何想法如何妥善处理?一个正则表达式或使用其他模块?
答案 0 :(得分:1)
正则表达式s{#(?:/|[^?/]*)$}{}
将涵盖所述规则:
/#/
,只需将其替换为/
。#
或/
上游没有跟上?
,请将其删除。以及用于演示的测试套件:
use strict;
use warnings;
use Test;
BEGIN { plan tests => 4 }
while (<DATA>) {
chomp;
my ($source, $goal) = split /\s*=>\s*/;
$source =~ s{#(?:/|[^?/]*)$}{};
ok($source, $goal);
}
__DATA__
http://www.example.com/#/ => http://www.example.com/
http://www.example.com/#foo/bar#foo => http://www.example.com/#foo/bar
http://www.example.com/#foo?a=1 => http://www.example.com/#foo?a=1
http://www.example.com/#foo/?a=1 => http://www.example.com/#foo/?a=1
输出:
1..4
# Running under perl version 5.018002 for MSWin32
# Current time local: Fri May 30 15:01:04 2014
# Current time GMT: Fri May 30 22:01:04 2014
# Using Test.pm version 1.26
ok 1
ok 2
ok 3
ok 4