正则表达式删除重复的网址

时间:2011-01-19 10:28:51

标签: c# regex vb.net

我有一个包含多个网址的列表,例如

google.com
google.com/1
google.com/2
google.com/3
google.com/4
google.com/5
google.com/6
yahoo.com
yahoo.com/1
yahoo.com/2
yahoo.com/3
yahoo.com/4
yahoo.com/5
yahoo.com/6

如何删除保留google.com/36的第3个条目,雅虎也是如此?

2 个答案:

答案 0 :(得分:0)

在C#中:

resultString = Regex.Replace(subjectString, 
    @"^        # Start at the start of a line
    [^/\r\n]+  # Match one or more characters except /
    $          # Match the end of the line, thereby ensuring that
               # the entire line does not contain a /
    (?:        # Match the following group:
     \r\n      # - a linebreak
     .*        # - an entire line
    ){2}       # exactly twice
    \r\n       # Match the final line break", 
    "", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

结果字符串:

google.com/3
google.com/4
google.com/5
google.com/6
yahoo.com/3
yahoo.com/4
yahoo.com/5
yahoo.com/6

答案 1 :(得分:0)

我不确定正则表达式是最好的方法。但无论如何,这就是它:

s/(google.com[\s/\d]*){3}//
s/(yahoo.com[\s/\d]*){3}//

正则表达式用斜杠括起来,前面的svi表示法中的替换