正则表达式。骆驼案要强调。忽略第一次出现

时间:2013-09-13 07:51:35

标签: c# regex

例如:

thisIsMySample 

应该是:

this_Is_My_Sample

我的代码:

System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", "_$0", System.Text.RegularExpressions.RegexOptions.Compiled);

它工作正常,但如果输入更改为:

ThisIsMySample

输出将是:

_This_Is_My_Sample

如何首先忽略?

7 个答案:

答案 0 :(得分:40)

非正则表达式解决方案

string result = string.Concat(input.Select((x,i) => i > 0 && char.IsUpper(x) ? "_" + x.ToString() : x.ToString())); 

似乎也很快:正则表达式:2569ms,C#:1489ms

Stopwatch stp = new Stopwatch();
stp.Start();
for (int i = 0; i < 1000000; i++)
{
    string input = "ThisIsMySample";
    string result = System.Text.RegularExpressions.Regex.Replace(input, "(?<=.)([A-Z])", "_$0",
            System.Text.RegularExpressions.RegexOptions.Compiled);
}
stp.Stop();
MessageBox.Show(stp.ElapsedMilliseconds.ToString());
// Result 2569ms

Stopwatch stp2 = new Stopwatch();
stp2.Start();
for (int i = 0; i < 1000000; i++)
{
    string input = "ThisIsMySample";
    string result = string.Concat(input.Select((x, j) => j > 0 && char.IsUpper(x) ? "_" + x.ToString() : x.ToString()));
}
stp2.Stop();
MessageBox.Show(stp2.ElapsedMilliseconds.ToString());
// Result: 1489ms

答案 1 :(得分:14)

您可以使用lookbehind确保每个匹配前面至少有一个字符:

System.Text.RegularExpressions.Regex.Replace(input, "(?<=.)([A-Z])", "_$0",
                      System.Text.RegularExpressions.RegexOptions.Compiled);

lookaheads和lookbehinds允许您对匹配匹配的文本进行断言,而不在匹配中包含该文本。

答案 2 :(得分:3)

也许喜欢;

var str = Regex.Replace(input, "([A-Z])", "_$0", RegexOptions.Compiled);
if(str.StartsWith("_"))
   str = str.SubString(1);

答案 3 :(得分:2)

在阐述sa_ddam213的解决方案时,我扩展了这个:

public static string GetConstStyleName(this string value)
        {
            return string.Concat(value.Select((x, i) =>
            {
                //want to avoid putting underscores between pairs of upper-cases or pairs of numbers, or adding redundant underscores if they already exist.
                bool isPrevCharLower = (i == 0) ? false : char.IsLower(value[i - 1]);
                bool isPrevCharNumber = (i == 0) ? false : char.IsNumber(value[i - 1]);
                return (isPrevCharLower && (char.IsUpper(x) || char.IsNumber(x))) //lower-case followed by upper-case or number needs underscore
                    || (isPrevCharNumber && (char.IsUpper(x))) //number followed by upper-case needs underscore
                    ? "_" + x.ToString() : x.ToString();
            })).ToUpperInvariant();
        }

答案 4 :(得分:1)

你需要通过定义你想要忽略第一个字符来修改你的正则表达式与第一个字符不匹配

.([A-Z])

上面的正则表达式只是排除了第一个出现的所有字符,因为它不在匹配组中的大括号中。

现在你需要像Bibhu指出的那样匹配第二组:

System.Text.RegularExpressions.Regex.Replace(s, "(.)([A-Z])", "$1_$2", System.Text.RegularExpressions.RegexOptions.Compiled);

答案 5 :(得分:1)

使用".([A-Z])"表示正则表达式,然后使用"_$1"进行替换。因此,您使用捕获的字符串进行替换,使用前导.,您确定没有捕获字符串的第一个字符。

答案 6 :(得分:0)

通过使用以下实现而不是Regex和Linq,您可以获得更好的性能,并且它使用Span类型来减少分配:

public ReadOnlySpan<char> ToSnakeCaseBySpan(string name)
{
    int upperCaseLength = 0;
    for (int i = 0; i < name.Length; i++)
    {
        if (name[i] >= 'A' && name[i] <= 'Z' && name[i] != name[0])
        {
            upperCaseLength++;
        }
    }
    int bufferSize = name.Length + upperCaseLength;
    Span<char> buffer = new char[bufferSize];
    int bufferPosition = 0;
    int namePosition = 0;
    while (bufferPosition < buffer.Length)
    {
        if (namePosition > 0 && name[namePosition] >= 'A' && name[namePosition] <= 'Z')
        {
            buffer[bufferPosition] = '_';
            buffer[bufferPosition + 1] = name[namePosition];
            bufferPosition += 2;
            namePosition++;
            continue;
        }
        buffer[bufferPosition] = name[namePosition];
        bufferPosition++;
        namePosition++;
    }
    return buffer;
}

基准

|                          Method |        Mean |      Error |     StdDev | Rank |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------------------- |------------:|-----------:|-----------:|-----:|-------:|------:|------:|----------:|
|               ToSnakeCaseBySpan |    27.47 ns |  0.4629 ns |  0.4330 ns |    1 | 0.0153 |     - |     - |      48 B |
|  ToSnakeCaseStringBuilderBySpan |    85.23 ns |  1.6495 ns |  1.3774 ns |    2 | 0.0637 |     - |     - |     200 B |
|       ToSnakeCaseNewtonsoftJson |    85.72 ns |  1.6418 ns |  1.4554 ns |    2 | 0.0484 |     - |     - |     152 B |
| ToSnakeCaseNewtonsoftJsonBySpan |    86.96 ns |  1.7060 ns |  1.5958 ns |    2 | 0.0484 |     - |     - |     152 B |
|                 ToSnakeCaseLinq |   353.42 ns |  3.9670 ns |  3.7108 ns |    3 | 0.1450 |     - |     - |     456 B |
|                ToSnakeCaseRegex | 2,056.69 ns | 29.5694 ns | 26.2125 ns |    4 | 0.1526 |     - |     - |     496 B |

Readmore