Question

我想以高效方式从给定字符串中删除任何字符，但字母除外。有什么建议吗？

Answer 1

var result = str.Where(c => char.IsLetter(c));

我对@ KirillPolishchuk的答案非常感兴趣，所以我刚用LINQPad做了一个小基准，使用随机构建的字符串，这里是完整的代码（我不得不略微更改我的原始代码，因为它返回了IEnumerable）：< / p>

void Main()
{
    TimeSpan elapsed;
    string result;

    elapsed = TheLINQWay(buildString(1000000), out result);
    Console.WriteLine("LINQ way: {0}", elapsed);

    elapsed = TheRegExWay(buildString(1000000), out result);
    Console.WriteLine("RegEx way: {0}", elapsed);
}

TimeSpan TheRegExWay(string s, out string result)
{
    Stopwatch stopw = new Stopwatch();

    stopw.Start();
    result = Regex.Replace(s, @"\P{L}", string.Empty);
    stopw.Stop();

    return stopw.Elapsed;
}

TimeSpan TheLINQWay(string s, out string result)
{
    Stopwatch stopw = new Stopwatch();

    stopw.Start();
    result = new string(s.Where(c => char.IsLetter(c)).ToArray());
    stopw.Stop();

    return stopw.Elapsed;
}

string buildString(int len)
{
    byte[] buffer = new byte[len];
    Random r = new Random((int)DateTime.Now.Ticks);

    for(int i = 0; i < len; i++)
        buffer[i] = (byte)r.Next(256);

    return Encoding.ASCII.GetString(buffer);
}

这是结果：

LINQ way: 00:00:00.0150030
RegEx way: 00:00:00.2788130

但仍然需要说一句话：正如Servy在评论中指出的那样，正则表达式更短，字符串更短。

Answer 2

使用：

var result = Regex.Replace(input, @"\P{L}", string.Empty);

Answer 3

我能想到的最有效的方式：

string input = "ABCD 13 ~";

// at worst, all characters are alphabetical, so we have to accommodate for that
char[] output = new char[input.Length];

int numberOfAlphabeticals = 0;
for (int i = 0; i < input.Length; i++)
{
    char character = input[i];
    var charCode = (byte) character;

    // based on ASCII 
    if ((charCode >= 65 && charCode <= 90) || (charCode >= 97 && charCode <= 122))
    {
        output[numberOfAlphabeticals ] = character;
        ++numberOfAlphabeticals ;
    }
}

string outputAsString = new string(output, 0, numberOfAlphabeticals );

Answer 4

我认为这是创建122个字符数组的最快方法（性能方面），将选择的字符串转换为字节数组并使用StringBuilder构建另一个字符串，其中删除了字符：

private static char[] alphabet = {'\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '\0', '\0', '\0', '\0', '\0', '\0', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',};

这是删除功能（没有编译它，但它应该给你的想法）：

string RemoveNonAlpha(string value)
{
    byte[] asciiBytes = Encoding.ASCII.GetBytes(value);
    StringBuilder sb = new StringBuilder();
    for(int i = 0; i < asciiBytes.Length; i++)
    {
        if((asciiBytes[i] >= 65 && asciiBytes[i] <= 90) || (asciiBytes[i] >= 97 && asciiBytes[i] <= 122))
        {
            sb.Append(alphabet[asciiBytes[i]]);
        }
    }

    return sb.ToString();
}

更新

基于Nikola's answer，这是一个改进版本：

private static string RemoveNonAlpha(string value)
{
    char[] output = new char[value.Length];
    int numAlpha = 0;
    byte charCode = 0;
    for (int i = 0; i < value.Length; i++)
    {
        charCode = (byte)value[i];
        if ((charCode >= 65 && charCode <= 90) || (charCode >= 97 && charCode <= 122))
        {
            output[numAlpha] = value[i];
            numAlpha++;
        }
    }

    return new string(output, 0, numAlpha);
}

以下是使用LINQ的结果：

The LINQ way 100: 6.7935
The fast way 100: 0.4648
The LINQ way 1000: 0.0442
The fast way 1000: 0.0134
The LINQ way 10000: 0.2078
The fast way 10000: 0.143
The LINQ way 100000: 2.0617
The fast way 100000: 1.3864

Answer 5

使用

^ \ W

作为正则表达式替换方法的输入

http://msdn.microsoft.com/en-us/library/xwewhkd1.aspx

从String中删除除字母表之外的所有内容

5 个答案:

更新