我想以高效方式从给定字符串中删除任何字符,但字母除外。有什么建议吗?
答案 0 :(得分:9)
var result = str.Where(c => char.IsLetter(c));
我对@ KirillPolishchuk的答案非常感兴趣,所以我刚用LINQPad做了一个小基准,使用随机构建的字符串,这里是完整的代码(我不得不略微更改我的原始代码,因为它返回了IEnumerable):< / p>
void Main()
{
TimeSpan elapsed;
string result;
elapsed = TheLINQWay(buildString(1000000), out result);
Console.WriteLine("LINQ way: {0}", elapsed);
elapsed = TheRegExWay(buildString(1000000), out result);
Console.WriteLine("RegEx way: {0}", elapsed);
}
TimeSpan TheRegExWay(string s, out string result)
{
Stopwatch stopw = new Stopwatch();
stopw.Start();
result = Regex.Replace(s, @"\P{L}", string.Empty);
stopw.Stop();
return stopw.Elapsed;
}
TimeSpan TheLINQWay(string s, out string result)
{
Stopwatch stopw = new Stopwatch();
stopw.Start();
result = new string(s.Where(c => char.IsLetter(c)).ToArray());
stopw.Stop();
return stopw.Elapsed;
}
string buildString(int len)
{
byte[] buffer = new byte[len];
Random r = new Random((int)DateTime.Now.Ticks);
for(int i = 0; i < len; i++)
buffer[i] = (byte)r.Next(256);
return Encoding.ASCII.GetString(buffer);
}
这是结果:
LINQ way: 00:00:00.0150030
RegEx way: 00:00:00.2788130
但仍然需要说一句话:正如Servy在评论中指出的那样,正则表达式更短,字符串更短。
答案 1 :(得分:6)
使用:
var result = Regex.Replace(input, @"\P{L}", string.Empty);
答案 2 :(得分:2)
我能想到的最有效的方式:
string input = "ABCD 13 ~";
// at worst, all characters are alphabetical, so we have to accommodate for that
char[] output = new char[input.Length];
int numberOfAlphabeticals = 0;
for (int i = 0; i < input.Length; i++)
{
char character = input[i];
var charCode = (byte) character;
// based on ASCII
if ((charCode >= 65 && charCode <= 90) || (charCode >= 97 && charCode <= 122))
{
output[numberOfAlphabeticals ] = character;
++numberOfAlphabeticals ;
}
}
string outputAsString = new string(output, 0, numberOfAlphabeticals );
答案 3 :(得分:1)
我认为这是创建122个字符数组的最快方法(性能方面),将选择的字符串转换为字节数组并使用StringBuilder
构建另一个字符串,其中删除了字符:
private static char[] alphabet = {'\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '\0', '\0', '\0', '\0', '\0', '\0', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',};
这是删除功能(没有编译它,但它应该给你的想法):
string RemoveNonAlpha(string value)
{
byte[] asciiBytes = Encoding.ASCII.GetBytes(value);
StringBuilder sb = new StringBuilder();
for(int i = 0; i < asciiBytes.Length; i++)
{
if((asciiBytes[i] >= 65 && asciiBytes[i] <= 90) || (asciiBytes[i] >= 97 && asciiBytes[i] <= 122))
{
sb.Append(alphabet[asciiBytes[i]]);
}
}
return sb.ToString();
}
基于Nikola's answer,这是一个改进版本:
private static string RemoveNonAlpha(string value)
{
char[] output = new char[value.Length];
int numAlpha = 0;
byte charCode = 0;
for (int i = 0; i < value.Length; i++)
{
charCode = (byte)value[i];
if ((charCode >= 65 && charCode <= 90) || (charCode >= 97 && charCode <= 122))
{
output[numAlpha] = value[i];
numAlpha++;
}
}
return new string(output, 0, numAlpha);
}
以下是使用LINQ的结果:
The LINQ way 100: 6.7935
The fast way 100: 0.4648
The LINQ way 1000: 0.0442
The fast way 1000: 0.0134
The LINQ way 10000: 0.2078
The fast way 10000: 0.143
The LINQ way 100000: 2.0617
The fast way 100000: 1.3864
答案 4 :(得分:0)