假设我有一个字符串,如:
"Hello how are you doing?"
我想要一个将多个空格转换为一个空格的函数。
所以我会得到:
"Hello how are you doing?"
我知道我可以使用正则表达式或致电
string s = "Hello how are you doing?".replace(" "," ");
但我必须多次调用它以确保所有连续的空格都只用一个替换。
是否已有内置方法?
答案 0 :(得分:183)
string cleanedString = System.Text.RegularExpressions.Regex.Replace(dirtyString,@"\s+"," ");
答案 1 :(得分:53)
这个问题并不像其他海报那样简单(并且正如我原先认为的那样) - 因为这个问题并不是很精确,因为它需要。
“空间”和“空白”之间存在差异。如果仅表示空格,则应使用" {2,}"
的正则表达式。如果你的意思是任何空白,那就是另一回事了。 所有空格应该转换为空格吗?在开始和结束时空间应该发生什么?
对于下面的基准测试,我假设您只关心空格,并且您不想对单个空格做任何事情,即使在开始和结束时也是如此。
请注意,正确性几乎总是比性能更重要。 Split / Join解决方案删除任何前导/尾随空格(即使只是单个空格)的事实对于您指定的要求(当然可能不完整)是不正确的。
基准使用MiniBench。
using System;
using System.Text.RegularExpressions;
using MiniBench;
internal class Program
{
public static void Main(string[] args)
{
int size = int.Parse(args[0]);
int gapBetweenExtraSpaces = int.Parse(args[1]);
char[] chars = new char[size];
for (int i=0; i < size/2; i += 2)
{
// Make sure there actually *is* something to do
chars[i*2] = (i % gapBetweenExtraSpaces == 1) ? ' ' : 'x';
chars[i*2 + 1] = ' ';
}
// Just to make sure we don't have a \0 at the end
// for odd sizes
chars[chars.Length-1] = 'y';
string bigString = new string(chars);
// Assume that one form works :)
string normalized = NormalizeWithSplitAndJoin(bigString);
var suite = new TestSuite<string, string>("Normalize")
.Plus(NormalizeWithSplitAndJoin)
.Plus(NormalizeWithRegex)
.RunTests(bigString, normalized);
suite.Display(ResultColumns.All, suite.FindBest());
}
private static readonly Regex MultipleSpaces =
new Regex(@" {2,}", RegexOptions.Compiled);
static string NormalizeWithRegex(string input)
{
return MultipleSpaces.Replace(input, " ");
}
// Guessing as the post doesn't specify what to use
private static readonly char[] Whitespace =
new char[] { ' ' };
static string NormalizeWithSplitAndJoin(string input)
{
string[] split = input.Split
(Whitespace, StringSplitOptions.RemoveEmptyEntries);
return string.Join(" ", split);
}
}
一些测试运行:
c:\Users\Jon\Test>test 1000 50
============ Normalize ============
NormalizeWithSplitAndJoin 1159091 0:30.258 22.93
NormalizeWithRegex 26378882 0:30.025 1.00
c:\Users\Jon\Test>test 1000 5
============ Normalize ============
NormalizeWithSplitAndJoin 947540 0:30.013 1.07
NormalizeWithRegex 1003862 0:29.610 1.00
c:\Users\Jon\Test>test 1000 1001
============ Normalize ============
NormalizeWithSplitAndJoin 1156299 0:29.898 21.99
NormalizeWithRegex 23243802 0:27.335 1.00
这里第一个数字是迭代次数,第二个是时间,第三个是比例得分,1.0是最好的。
这表明至少在某些情况下(包括这一个),正则表达式可以胜过Split / Join解决方案,有时候会非常显着。
但是,如果您更改为“所有空白”要求,则分割/加入 似乎获胜。像往常一样,魔鬼在细节......
答案 2 :(得分:18)
虽然现有的答案很好,但我想指出一种不的方法:
public static string DontUseThisToCollapseSpaces(string text)
{
while (text.IndexOf(" ") != -1)
{
text = text.Replace(" ", " ");
}
return text;
}
这可以永远循环。有人在乎为什么要猜? (几年前,当我被问为新闻组问题时,我才遇到过这个问题......有人实际上遇到了这个问题。)
答案 3 :(得分:17)
正则表达式是最简单的方法。如果以正确的方式编写正则表达式,则不需要多次调用。
将其更改为:
string s = System.Text.RegularExpressions.Regex.Replace(s, @"\s{2,}", " ");
答案 4 :(得分:4)
正如已经指出的那样,这可以通过正则表达式轻松完成。我只想补充一点,你可能想要添加一个.trim()来摆脱前导/尾随空格。
答案 5 :(得分:4)
这是我使用的解决方案。没有RegEx和String.Split。
public static string TrimWhiteSpace(this string Value)
{
StringBuilder sbOut = new StringBuilder();
if (!string.IsNullOrEmpty(Value))
{
bool IsWhiteSpace = false;
for (int i = 0; i < Value.Length; i++)
{
if (char.IsWhiteSpace(Value[i])) //Comparion with WhiteSpace
{
if (!IsWhiteSpace) //Comparison with previous Char
{
sbOut.Append(Value[i]);
IsWhiteSpace = true;
}
}
else
{
IsWhiteSpace = false;
sbOut.Append(Value[i]);
}
}
}
return sbOut.ToString();
}
所以你可以:
string cleanedString = dirtyString.TrimWhiteSpace();
答案 6 :(得分:3)
我正在分享我使用的东西,因为看起来我已经想出了不同的东西。我已经使用了一段时间,它对我来说足够快。我不确定它是如何与其他人叠加的。我在分隔文件编写器中使用它,并通过它一次运行一个字段的大型数据表。
public static string NormalizeWhiteSpace(string S)
{
string s = S.Trim();
bool iswhite = false;
int iwhite;
int sLength = s.Length;
StringBuilder sb = new StringBuilder(sLength);
foreach(char c in s.ToCharArray())
{
if(Char.IsWhiteSpace(c))
{
if (iswhite)
{
//Continuing whitespace ignore it.
continue;
}
else
{
//New WhiteSpace
//Replace whitespace with a single space.
sb.Append(" ");
//Set iswhite to True and any following whitespace will be ignored
iswhite = true;
}
}
else
{
sb.Append(c.ToString());
//reset iswhitespace to false
iswhite = false;
}
}
return sb.ToString();
}
答案 7 :(得分:3)
快速额外的空白移除...... 这是最快的一个,基于Felipe Machado的就地副本。
static string InPlaceCharArray(string str)
{
var len = str.Length;
var src = str.ToCharArray();
int dstIdx = 0;
bool lastWasWS = false;
for (int i = 0; i < len; i++)
{
var ch = src[i];
if (src[i] == '\u0020')
{
if (lastWasWS == false)
{
src[dstIdx++] = ch;
lastWasWS = true;
}
}
else
{
lastWasWS = false;
src[dstIdx++] = ch;
}
}
return new string(src, 0, dstIdx);
}
基准......
InPlaceCharArraySpaceOnly by Felipe Machado on CodeProject 2015并由Sunsetquest修改以进行多空间移除。 时间:3.75蜱虫
InPlaceCharArray by Felipe Machado 2015年,由Sunsetquest稍作修改以进行多空间移除。 时间6.50 Ticks (也支持标签页)
由Jon Skeet组成的SplitAndJoinOnSpace。 时间:13.25 Ticks
fubo之后的StringBuilder 时间:13.5 Ticks (也支持标签页)
由Jon Skeet编译的正则表达式。 时间:17 Ticks
David S 2013年的StringBuilder 时间:30.5 Ticks
由Brandon进行非编译的正则表达式 时间:63.25 Ticks
user214147之后的StringBuilder 时间:77.125 Ticks
非编译的正则表达式Tim Hoolihan 时间:147.25 Ticks
基准代码......
using System;
using System.Text.RegularExpressions;
using System.Diagnostics;
using System.Threading;
using System.Text;
static class Program
{
public static void Main(string[] args)
{
long seed = ConfigProgramForBenchmarking();
Stopwatch sw = new Stopwatch();
string warmup = "This is a Warm up function for best benchmark results." + seed;
string input1 = "Hello World, how are you doing?" + seed;
string input2 = "It\twas\t \tso nice to\t\t see you \tin 1950. \t" + seed;
string correctOutput1 = "Hello World, how are you doing?" + seed;
string correctOutput2 = "It\twas\tso nice to\tsee you in 1950. " + seed;
string output1,output2;
//warm-up timer function
sw.Restart();
sw.Stop();
sw.Restart();
sw.Stop();
long baseVal = sw.ElapsedTicks;
// InPlace Replace by Felipe Machado but modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin)
output1 = InPlaceCharArraySpaceOnly (warmup);
sw.Restart();
output1 = InPlaceCharArraySpaceOnly (input1);
output2 = InPlaceCharArraySpaceOnly (input2);
sw.Stop();
Console.WriteLine("InPlaceCharArraySpaceOnly : " + (sw.ElapsedTicks - baseVal));
Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));
// InPlace Replace by Felipe R. Machado and slightly modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin)
output1 = InPlaceCharArray(warmup);
sw.Restart();
output1 = InPlaceCharArray(input1);
output2 = InPlaceCharArray(input2);
sw.Stop();
Console.WriteLine("InPlaceCharArray: " + (sw.ElapsedTicks - baseVal));
Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));
//Regex with non-compile Tim Hoolihan (https://stackoverflow.com/a/1279874/2352507)
string cleanedString =
output1 = Regex.Replace(warmup, @"\s+", " ");
sw.Restart();
output1 = Regex.Replace(input1, @"\s+", " ");
output2 = Regex.Replace(input2, @"\s+", " ");
sw.Stop();
Console.WriteLine("Regex by Tim Hoolihan: " + (sw.ElapsedTicks - baseVal));
Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));
//Regex with compile by Jon Skeet (https://stackoverflow.com/a/1280227/2352507)
output1 = MultipleSpaces.Replace(warmup, " ");
sw.Restart();
output1 = MultipleSpaces.Replace(input1, " ");
output2 = MultipleSpaces.Replace(input2, " ");
sw.Stop();
Console.WriteLine("Regex with compile by Jon Skeet: " + (sw.ElapsedTicks - baseVal));
Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));
//Split And Join by Jon Skeet (https://stackoverflow.com/a/1280227/2352507)
output1 = SplitAndJoinOnSpace(warmup);
sw.Restart();
output1 = SplitAndJoinOnSpace(input1);
output2 = SplitAndJoinOnSpace(input2);
sw.Stop();
Console.WriteLine("Split And Join by Jon Skeet: " + (sw.ElapsedTicks - baseVal));
Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));
//Regex by Brandon (https://stackoverflow.com/a/1279878/2352507
output1 = Regex.Replace(warmup, @"\s{2,}", " ");
sw.Restart();
output1 = Regex.Replace(input1, @"\s{2,}", " ");
output2 = Regex.Replace(input2, @"\s{2,}", " ");
sw.Stop();
Console.WriteLine("Regex by Brandon: " + (sw.ElapsedTicks - baseVal));
Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));
//StringBuilder by user214147 (https://stackoverflow.com/a/2156660/2352507
output1 = user214147(warmup);
sw.Restart();
output1 = user214147(input1);
output2 = user214147(input2);
sw.Stop();
Console.WriteLine("StringBuilder by user214147: " + (sw.ElapsedTicks - baseVal));
Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));
//StringBuilder by fubo (https://stackoverflow.com/a/27502353/2352507
output1 = fubo(warmup);
sw.Restart();
output1 = fubo(input1);
output2 = fubo(input2);
sw.Stop();
Console.WriteLine("StringBuilder by fubo: " + (sw.ElapsedTicks - baseVal));
Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));
//StringBuilder by David S 2013 (https://stackoverflow.com/a/16035044/2352507)
output1 = SingleSpacedTrim(warmup);
sw.Restart();
output1 = SingleSpacedTrim(input1);
output2 = SingleSpacedTrim(input2);
sw.Stop();
Console.WriteLine("StringBuilder(SingleSpacedTrim) by David S: " + (sw.ElapsedTicks - baseVal));
Console.WriteLine(" Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
Console.WriteLine(" Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));
}
// InPlace Replace by Felipe Machado and slightly modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin)
static string InPlaceCharArray(string str)
{
var len = str.Length;
var src = str.ToCharArray();
int dstIdx = 0;
bool lastWasWS = false;
for (int i = 0; i < len; i++)
{
var ch = src[i];
if (src[i] == '\u0020')
{
if (lastWasWS == false)
{
src[dstIdx++] = ch;
lastWasWS = true;
}
}
else
{
lastWasWS = false;
src[dstIdx++] = ch;
}
}
return new string(src, 0, dstIdx);
}
// InPlace Replace by Felipe R. Machado but modified by Ryan for multi-space removal (http://www.codeproject.com/Articles/1014073/Fastest-method-to-remove-all-whitespace-from-Strin)
static string InPlaceCharArraySpaceOnly (string str)
{
var len = str.Length;
var src = str.ToCharArray();
int dstIdx = 0;
bool lastWasWS = false; //Added line
for (int i = 0; i < len; i++)
{
var ch = src[i];
switch (ch)
{
case '\u0020': //SPACE
case '\u00A0': //NO-BREAK SPACE
case '\u1680': //OGHAM SPACE MARK
case '\u2000': // EN QUAD
case '\u2001': //EM QUAD
case '\u2002': //EN SPACE
case '\u2003': //EM SPACE
case '\u2004': //THREE-PER-EM SPACE
case '\u2005': //FOUR-PER-EM SPACE
case '\u2006': //SIX-PER-EM SPACE
case '\u2007': //FIGURE SPACE
case '\u2008': //PUNCTUATION SPACE
case '\u2009': //THIN SPACE
case '\u200A': //HAIR SPACE
case '\u202F': //NARROW NO-BREAK SPACE
case '\u205F': //MEDIUM MATHEMATICAL SPACE
case '\u3000': //IDEOGRAPHIC SPACE
case '\u2028': //LINE SEPARATOR
case '\u2029': //PARAGRAPH SEPARATOR
case '\u0009': //[ASCII Tab]
case '\u000A': //[ASCII Line Feed]
case '\u000B': //[ASCII Vertical Tab]
case '\u000C': //[ASCII Form Feed]
case '\u000D': //[ASCII Carriage Return]
case '\u0085': //NEXT LINE
if (lastWasWS == false) //Added line
{
src[dstIdx++] = ch; //Added line
lastWasWS = true; //Added line
}
continue;
default:
lastWasWS = false; //Added line
src[dstIdx++] = ch;
break;
}
}
return new string(src, 0, dstIdx);
}
static readonly Regex MultipleSpaces =
new Regex(@" {2,}", RegexOptions.Compiled);
//Split And Join by Jon Skeet (https://stackoverflow.com/a/1280227/2352507)
static string SplitAndJoinOnSpace(string input)
{
string[] split = input.Split(new char[] { ' '}, StringSplitOptions.RemoveEmptyEntries);
return string.Join(" ", split);
}
//StringBuilder by user214147 (https://stackoverflow.com/a/2156660/2352507
public static string user214147(string S)
{
string s = S.Trim();
bool iswhite = false;
int iwhite;
int sLength = s.Length;
StringBuilder sb = new StringBuilder(sLength);
foreach (char c in s.ToCharArray())
{
if (Char.IsWhiteSpace(c))
{
if (iswhite)
{
//Continuing whitespace ignore it.
continue;
}
else
{
//New WhiteSpace
//Replace whitespace with a single space.
sb.Append(" ");
//Set iswhite to True and any following whitespace will be ignored
iswhite = true;
}
}
else
{
sb.Append(c.ToString());
//reset iswhitespace to false
iswhite = false;
}
}
return sb.ToString();
}
//StringBuilder by fubo (https://stackoverflow.com/a/27502353/2352507
public static string fubo(this string Value)
{
StringBuilder sbOut = new StringBuilder();
if (!string.IsNullOrEmpty(Value))
{
bool IsWhiteSpace = false;
for (int i = 0; i < Value.Length; i++)
{
if (char.IsWhiteSpace(Value[i])) //Comparison with WhiteSpace
{
if (!IsWhiteSpace) //Comparison with previous Char
{
sbOut.Append(Value[i]);
IsWhiteSpace = true;
}
}
else
{
IsWhiteSpace = false;
sbOut.Append(Value[i]);
}
}
}
return sbOut.ToString();
}
//David S. 2013 (https://stackoverflow.com/a/16035044/2352507)
public static String SingleSpacedTrim(String inString)
{
StringBuilder sb = new StringBuilder();
Boolean inBlanks = false;
foreach (Char c in inString)
{
switch (c)
{
case '\r':
case '\n':
case '\t':
case ' ':
if (!inBlanks)
{
inBlanks = true;
sb.Append(' ');
}
continue;
default:
inBlanks = false;
sb.Append(c);
break;
}
}
return sb.ToString().Trim();
}
/// <summary>
/// We want to run this item with max priory to lower the odds of
/// the OS from doing program context switches in the middle of our code.
/// source:https://stackoverflow.com/a/16157458
/// </summary>
/// <returns>random seed</returns>
private static long ConfigProgramForBenchmarking()
{
//prevent the JIT Compiler from optimizing Fkt calls away
long seed = Environment.TickCount;
//use the second Core/Processor for the test
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2);
//prevent "Normal" Processes from interrupting Threads
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
//prevent "Normal" Threads from interrupting this thread
Thread.CurrentThread.Priority = ThreadPriority.Highest;
return seed;
}
}
基准测试:发布模式,附带无调试器,i7处理器,平均4次运行,仅测试短字符串
答案 8 :(得分:2)
VB.NET
Linha.Split(" ").ToList().Where(Function(x) x <> " ").ToArray
C#
Linha.Split(" ").ToList().Where(x => x != " ").ToArray();
享受LINQ = D
的力量答案 9 :(得分:2)
使用Jon Skeet发布的测试程序,我试着看看是否可以通过手写循环来更快地运行。
我每次都可以击败NormalizeWithSplitAndJoin,但只有输入1000,5才能击败NormalizeWithRegex。
static string NormalizeWithLoop(string input)
{
StringBuilder output = new StringBuilder(input.Length);
char lastChar = '*'; // anything other then space
for (int i = 0; i < input.Length; i++)
{
char thisChar = input[i];
if (!(lastChar == ' ' && thisChar == ' '))
output.Append(thisChar);
lastChar = thisChar;
}
return output.ToString();
}
我没有查看抖动产生的机器代码,但是我希望问题是调用StringBuilder.Append()所花费的时间,并且要做得更好,需要使用不安全的代码。
所以Regex.Replace()非常快且难以击败!!
答案 10 :(得分:1)
Regex regex = new Regex(@"\W+");
string outputString = regex.Replace(inputString, " ");
答案 11 :(得分:0)
最小的解决方案:
var regExp = / \ s + / g, newString = oldString.replace(regExp,'');
答案 12 :(得分:0)
您可以尝试以下方法:
/// <summary>
/// Remove all extra spaces and tabs between words in the specified string!
/// </summary>
/// <param name="str">The specified string.</param>
public static string RemoveExtraSpaces(string str)
{
str = str.Trim();
StringBuilder sb = new StringBuilder();
bool space = false;
foreach (char c in str)
{
if (char.IsWhiteSpace(c) || c == (char)9) { space = true; }
else { if (space) { sb.Append(' '); }; sb.Append(c); space = false; };
}
return sb.ToString();
}
答案 13 :(得分:0)
替换组提供了一种不受欢迎的方法,解决了用相同单个替换多个空白字符的问题:
public static void WhiteSpaceReduce()
{
string t1 = "a b c d";
string t2 = "a b\n\nc\nd";
Regex whiteReduce = new Regex(@"(?<firstWS>\s)(?<repeatedWS>\k<firstWS>+)");
Console.WriteLine("{0}", t1);
//Console.WriteLine("{0}", whiteReduce.Replace(t1, x => x.Value.Substring(0, 1)));
Console.WriteLine("{0}", whiteReduce.Replace(t1, @"${firstWS}"));
Console.WriteLine("\nNext example ---------");
Console.WriteLine("{0}", t2);
Console.WriteLine("{0}", whiteReduce.Replace(t2, @"${firstWS}"));
Console.WriteLine();
}
请注意,第二个示例保留单个\n
,而接受的答案将用空格替换行尾。
如果您需要用第一个空格替换任何空格字符组合,只需从模式中删除后向引用\k
。
答案 14 :(得分:0)
string.Join(" ", s.Split(" ").Where(r => r != ""));
答案 15 :(得分:-1)
没有办法内置这样做。你可以试试这个:
private static readonly char[] whitespace = new char[] { ' ', '\n', '\t', '\r', '\f', '\v' };
public static string Normalize(string source)
{
return String.Join(" ", source.Split(whitespace, StringSplitOptions.RemoveEmptyEntries));
}
这将删除前导和尾随whitespce以及将任何内部空白折叠为单个空白字符。如果你真的只想折叠空格,那么使用正则表达式的解决方案会更好;否则这个解决方案更好。 (参见Jon Skeet所做的analysis。)