在查找未转义的字符时使用Regex Replace

时间:2012-11-16 16:46:41

标签: c# .net regex

我的要求基本上就是这个。如果我有一串文字,如

"There once was an 'ugly' duckling but it could 
never have been \'Scarlett\' Johansen"

然后我想匹配尚未转义的报价。这些将是'丑陋'周围的那些,而不是'思嘉'周围的那些。

我花了很长时间使用一个小小的C#控制台应用来测试一些东西,并提出了以下解决方案。

private static void RegexFunAndGames() {

  string result;
  string sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then";
  string rePattern = @"\\'";
  string replaceWith = "'";

  Console.WriteLine(sampleText);

  Regex regEx = new Regex(rePattern);
  result = regEx.Replace(sampleText, replaceWith);

  result = result.Replace("'", @"\'");

  Console.WriteLine(result);
}

基本上我所做的是一个两步过程找到那些已经被转义的角色,撤消然后再做一切。这听起来有点笨拙,我觉得可能有更好的方式。

测试信息

我得到了两个非常好的答案,所以我认为值得运行一个测试,看看哪个运行得更好。我有这两个功能:

    private static string RegexReplace(string sampleText) {
        Regex regEx = new Regex("(?<!\\\\)'");
        return regEx.Replace(sampleText, "\\'");           
    }

    private static string ReplaceTest(string sampleText) {
        return sampleText.Replace(@"\'", "'").Replace("'", @"\'");
    }

我通过控制台应用程序中的Main方法调用它们:

    static void Main(string[] args) {

        string sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief'  but not in 'Stardust' because they'd stopped acting by then.";
        string testReplace = string.Empty;
        System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

        sw.Start();
        for (int i = 1000000; i > 0; i--) {
            testReplace = ReplaceTest(sampleText);
        }

        sw.Stop();
        Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");

        sw.Reset();
        sw.Start();
        for (int i = 1000000; i > 0; i--) {
            testReplace = RegexReplace(sampleText);
        }

        sw.Stop();
        Console.WriteLine("This method took '" + sw.ElapsedMilliseconds.ToString() + "'");
}

ReplaceTest方法需要2068毫秒。 RegexReplace方法需要9372毫秒。我已经运行了几次这个测试,并且ReplaceTest总是最快。

3 个答案:

答案 0 :(得分:4)

您可以使用否定后瞻来确保引用转义:下面的表达式

(?<!\\)'

匹配单引号,除非后面紧跟斜杠。

请注意,进入字符串常量的斜杠必须加倍。

var sampleText = @"Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief' but not in 'Stardust' because they'd stopped acting by then";
var regEx = new Regex("(?<!\\\\)'");
var result = regEx.Replace(sampleText, "\\'");
Console.WriteLine(result);

以上打印

Mr. Grant and Ms. Kelly  starred in the film \'To Catch A Thief\' but not in \'Stardust\' because they\'d stopped acting by then

Link to ideone.

答案 1 :(得分:3)

我很惊讶为什么你使用RegEx这样做,为什么不简单地使用:

string result = sampleText.Replace(@"\'", "'").Replace("'", @"\'");

这将逃脱所有未转义的'

它首先会使所有转义'(单引号)未转义,然后转义all

好吧,如果RegEx is the requirement,您将接受正确的解决方案,正如您已经说过的那样。

答案 2 :(得分:-1)

您可以使用

    string rePattern = @"[\\'|\']"; 

相反