正则表达式替换

时间:2011-03-28 18:23:03

标签: c# regex

我需要一个Reg Ex脚本

  • 删除所有符号
  • 允许最多1个连字符相互连接
  • 允许最多1个期间总计

示例:

  • Mike& Ike 输出是:MikeIke
  • Mike-Ike 输出是:Mike-Ike
  • Mike-Ike-Jill 输出是:Mike-Ike-Jill
  • Mike - Ike-Jill 输出是:Mike-Ike-Jill
  • Mike - Ike --- Jill 输出是:Mike-Ike-Jill
  • Mike.Ike.Bill 输出为:Mike.IkeBill
  • Mike *** Joe 输出是:MikeJoe
  • Mike123 输出为:Mike123

2 个答案:

答案 0 :(得分:3)

#!/usr/bin/env perl

use 5.10.0;
use strict;
use warnings;

my @samples = (
    "Mike&Ike"          => "MikeIke",
    "Mike-Ike"          => "Mike-Ike",
    "Mike-Ike-Jill"     => "Mike-Ike-Jill",
    "Mike--Ike-Jill"    => "Mike-Ike-Jill",
    "Mike--Ike---Jill"  => "Mike-Ike-Jill",
    "Mike.Ike.Bill"     => "Mike.IkeBill",
    "Mike***Joe"        => "MikeJoe",
    "Mike123"           => "Mike123",
);

while (my($got, $want) = splice(@samples, 0, 2)) {
    my $had = $got;
    for ($got) {
  # 1) Allow max 1 dashy bit connected to each other.
        s/ ( \p{Dash} ) \p{Dash}+                           /$1/xg;
  # 2) Allow max 1 period, total.
        1 while s/ ^ [^.]* \. [^.]* \K \.                   //x   ;
  # 3) Remove all symbols...
        s/ (?! [\p{Dash}.] ) [\p{Symbol}\p{Punctuation}]    //xg  ;
  #                   ...and punctuation
  #       except for dashy bits and dots.
    }

    if ($got eq $want) { print "RIGHT" }
    else               { print "WRONG" }
    print ":\thad\t<$had>\n\twanted\t<$want>\n\tgot\t<$got>\n";
}

生成:

RIGHT:  had <Mike&Ike>
    wanted  <MikeIke>
    got <MikeIke>
RIGHT:  had <Mike-Ike>
    wanted  <Mike-Ike>
    got <Mike-Ike>
RIGHT:  had <Mike-Ike-Jill>
    wanted  <Mike-Ike-Jill>
    got <Mike-Ike-Jill>
RIGHT:  had <Mike--Ike-Jill>
    wanted  <Mike-Ike-Jill>
    got <Mike-Ike-Jill>
RIGHT:  had <Mike--Ike---Jill>
    wanted  <Mike-Ike-Jill>
    got <Mike-Ike-Jill>
RIGHT:  had <Mike.Ike.Bill>
    wanted  <Mike.IkeBill>
    got <Mike.IkeBill>
RIGHT:  had <Mike***Joe>
    wanted  <MikeJoe>
    got <MikeJoe>
RIGHT:  had <Mike123>
    wanted  <Mike123>
    got <Mike123>

答案 1 :(得分:0)

你可以通过几次传球做点什么 它是一种通用的解决方法,可以通过使用lookbehind来缩短 (并非所有正则表达式都支持此功能)

  1. 使用正则表达式-
  2. 删除多个-{2,}
  3. 使用正则表达式-.
  4. 删除[^-\.A-Za-z0-9]以外的符号
  5. 将第一个.替换为临时字符,例如!并替换剩余的.
  6. 使用!
  7. 替换上一步中的.

    更新 使用C#.net
    (我不是C#程序员,使用此regex tester和此reference来表示C#.net正则表达式。)

    String str = "Mike&Ike ......";
    str = Regex.Replace( str, @"-+", @"-" );
    str = Regex.Replace( str, @"(?<=\.)(.*?)\.", @"$1" );
    str = Regex.Replace( str, @"[^\w\r\n]", @"" );
    
    1. 用单-
    2. 替换multipe -
    3. 删除.如果它不是第一个.使用positiv lookbehind (?<=...)
    4. 删除符号(实际上所有内容都不是单词字符或换行符)\w[A-Za-z0-9]
    5. 的缩写