Perl正则表达式在特定位置插入字符串

时间:2013-03-07 13:57:13

标签: regex string perl insert

我有以下代码:

#!/usr/bin/perl

use strict;
use warnings;

use URI qw( );

my @insert_words = qw( HELLO );

while (<DATA>) {
chomp;
my $url = URI->new($_);
my $path = $url->path();

for (@insert_words) {
  # Use package vars to communicate with /(?{})/ blocks.
  my $insert_word = $_;
  local our @paths;
  $path =~ m{
     ^(.*/)([^/]*)((?:/.*)?)\z
     (?{

        push @paths, "$1$insert_word$2$3";
        if (length($2)) {
           push @paths, "$1$insert_word$3";
           push @paths, "$1$2$insert_word$3";
        }
     })
     (?!)
  }x;

  for (@paths) {
     $url->path($_);
     print "$url\n";
    }
  }
}

__DATA__
http://www.bagandboxfactory.com/index.php?route=checkout/
http://www.stackoverflow.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/

目前,上述代码完全适用于stackoverflow and superuser中的__DATA__网址。它为stackoverflow url:

提供以下输出
http://www.stackoverflow.com/dog/cat/rabbit/HELLO
http://www.stackoverflow.com/dog/cat/HELLOrabbit/
http://www.stackoverflow.com/dog/cat/HELLO/
http://www.stackoverflow.com/dog/cat/rabbitHELLO/
http://www.stackoverflow.com/dog/HELLOcat/rabbit/
http://www.stackoverflow.com/dog/HELLO/rabbit/
http://www.stackoverflow.com/dog/catHELLO/rabbit/
http://www.stackoverflow.com/HELLOdog/cat/rabbit/
http://www.stackoverflow.com/HELLO/cat/rabbit/
http://www.stackoverflow.com/dogHELLO/cat/rabbit/

正如您所看到的,当字符串HELLO出现斜杠时会将其插入特定位置(/)。

我遇到的问题:

如果在网址中找到等号(=),我希望发生同样的事情。

http://www.bagandboxfactory.com/index.php?route=checkout为例,我希望输出能够提供以下内容:

http://www.bagandboxfactory.com/index.php?route=HELLOcheckout/   <- puts HELLO before the string after the equals
http://www.bagandboxfactory.com/index.php?route=HELLO/   <- replaces the string after the equals with HELLO
http://www.bagandboxfactory.com/index.php?route=checkoutHELLO/  <- puts HELLO after the string that is after the equals

我认为从

改变正则表达式

^(.*/)([^/]*)((?:/.*)?)\z

^(.*[/=])([^/=]*)((?:[/=].*)?)\z会起作用,但事实并非如此。

我需要更改为正则表达式才能执行此操作?

非常感谢您的帮助,非常感谢

__ UPDATE_ _ _

它需要能够处理多个参数,例如,如果我有url http://www.example.com/dog/cat=2&foo=5,我应该得到的输出如下:

http://www.example.com/HELLOdog/cat=2&foo=5  
http://www.example.com/HELLO/cat=2&foo=5
http://www.example.com/dogHELLO/cat=2&foo=5
http://www.example.com/dog/cat=HELLO2&foo=5
http://www.example.com/dog/cat=HELLO&foo=5
http://www.example.com/dog/cat=2HELLO&foo=5
http://www.example.com/dog/cat=2&foo=HELLO5
http://www.example.com/dog/cat=2&foo=HELLO
http://www.example.com/dog/cat=2&foo=5HELLO

我已经拥有的代码正常运行,而会在网址中为每个斜线执行此操作,但我现在希望它在遇到{{1}时执行此操作在url中也是(或者我选择在正则表达式中指定的任何其他字符,例如[/=@-]).

1 个答案:

答案 0 :(得分:1)

使用正则表达式引擎来为您执行回溯是一种聪明的技巧。

主要问题是问号之后的部分是查询,可通过$url->query获得。 $url->path返回没有查询的路径组件。

将代码修改为

#!/usr/bin/perl

use strict;
use warnings;

use URI qw( );

my @insert_words = qw( HELLO );

while (<DATA>) {
    chomp;
    my $url = URI->new($_);
    my $path = $url->path();
    my $query = $url->query;

    for (@insert_words) {
      # Use package vars to communicate with /(?{})/ blocks.
      my $insert_word = $_;
      local our @paths;
      $path =~ m{
         ^(.*/)([^/]*)((?:/.*)?)\z
         (?{
            push @paths, "$1$insert_word$2$3";
            if (length($2)) {
               push @paths, "$1$insert_word$3";
               push @paths, "$1$2$insert_word$3";
            }
         })
         (?!)
      }x;

      local our @queries;
      if (defined $query) {
          $query =~ m{
              ^(.*[/=])([^/=&]*)((?:[/=&].*)?)\z
              (?{
                  if (length $2) {
                      push @queries, "$1$insert_word$2$3";
                      push @queries, "$1$insert_word$3";
                      push @queries, "$1$2$insert_word$3";
                  }
              })
              (?!)
          }x;
      }

      for (@paths) {
          $url->path($_);

          if (@queries) {
              for (@queries) {
                  $url->query($_);
                  print $url, "\n";
              }
          }
          else {
              print $url, "\n";
          }
      }
    }
}

__DATA__
http://www.bagandboxfactory.com/index.php?route=checkout/
http://www.stackoverflow.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
http://www.example.com/index.php?route=9&other=7/

给出以下输出。查询替换的逻辑略有不同,因为它会在查询中的尾部斜杠之后附加每个@insert_words(如果存在)。

http://www.bagandboxfactory.com/HELLOindex.php?route=HELLOcheckout/
http://www.bagandboxfactory.com/HELLOindex.php?route=HELLO/
http://www.bagandboxfactory.com/HELLOindex.php?route=checkoutHELLO/
http://www.bagandboxfactory.com/HELLO?route=HELLOcheckout/
http://www.bagandboxfactory.com/HELLO?route=HELLO/
http://www.bagandboxfactory.com/HELLO?route=checkoutHELLO/
http://www.bagandboxfactory.com/index.phpHELLO?route=HELLOcheckout/
http://www.bagandboxfactory.com/index.phpHELLO?route=HELLO/
http://www.bagandboxfactory.com/index.phpHELLO?route=checkoutHELLO/
http://www.stackoverflow.com/dog/cat/rabbit/HELLO
http://www.stackoverflow.com/dog/cat/HELLOrabbit/
http://www.stackoverflow.com/dog/cat/HELLO/
http://www.stackoverflow.com/dog/cat/rabbitHELLO/
http://www.stackoverflow.com/dog/HELLOcat/rabbit/
http://www.stackoverflow.com/dog/HELLO/rabbit/
http://www.stackoverflow.com/dog/catHELLO/rabbit/
http://www.stackoverflow.com/HELLOdog/cat/rabbit/
http://www.stackoverflow.com/HELLO/cat/rabbit/
http://www.stackoverflow.com/dogHELLO/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/HELLO
http://www.superuser.co.uk/dog/cat/rabbit/HELLOhamster/
http://www.superuser.co.uk/dog/cat/rabbit/HELLO/
http://www.superuser.co.uk/dog/cat/rabbit/hamsterHELLO/
http://www.superuser.co.uk/dog/cat/HELLOrabbit/hamster/
http://www.superuser.co.uk/dog/cat/HELLO/hamster/
http://www.superuser.co.uk/dog/cat/rabbitHELLO/hamster/
http://www.superuser.co.uk/dog/HELLOcat/rabbit/hamster/
http://www.superuser.co.uk/dog/HELLO/rabbit/hamster/
http://www.superuser.co.uk/dog/catHELLO/rabbit/hamster/
http://www.superuser.co.uk/HELLOdog/cat/rabbit/hamster/
http://www.superuser.co.uk/HELLO/cat/rabbit/hamster/
http://www.superuser.co.uk/dogHELLO/cat/rabbit/hamster/
http://www.example.com/HELLOindex.php?route=9&other=HELLO7/
http://www.example.com/HELLOindex.php?route=9&other=HELLO/
http://www.example.com/HELLOindex.php?route=9&other=7HELLO/
http://www.example.com/HELLOindex.php?route=HELLO9&other=7/
http://www.example.com/HELLOindex.php?route=HELLO&other=7/
http://www.example.com/HELLOindex.php?route=9HELLO&other=7/
http://www.example.com/HELLO?route=9&other=HELLO7/
http://www.example.com/HELLO?route=9&other=HELLO/
http://www.example.com/HELLO?route=9&other=7HELLO/
http://www.example.com/HELLO?route=HELLO9&other=7/
http://www.example.com/HELLO?route=HELLO&other=7/
http://www.example.com/HELLO?route=9HELLO&other=7/
http://www.example.com/index.phpHELLO?route=9&other=HELLO7/
http://www.example.com/index.phpHELLO?route=9&other=HELLO/
http://www.example.com/index.phpHELLO?route=9&other=7HELLO/
http://www.example.com/index.phpHELLO?route=HELLO9&other=7/
http://www.example.com/index.phpHELLO?route=HELLO&other=7/
http://www.example.com/index.phpHELLO?route=9HELLO&other=7/