我需要创建一个从用户输入中删除大量空白区域的方法,这样就不会弄乱通过UI或报表呈现的信息格式。我希望阻止用户连续使用制表符,多个空格和两个以上的回车符。这是我目前的解决方案(效果很好),但有人有什么更整洁的吗?我的主要挑战是确保用户可以使用单回车或双回车:
public static class StringHelper
{
private static readonly string SingleBreakGuid = Guid.NewGuid().ToString();
private static readonly string DoubleBreakGuid = Guid.NewGuid().ToString();
/// <summary>
/// Limits character spacing to single spacing. Limits line spacing to no more than double line spacing.
/// </summary>
/// <param name="sourceString">The source string that will be used to calculate the result.</param>
/// <returns>A string with single spacing between characters and no more than double line spacing.</returns>
public static string RemoveExtensiveWhiteSpace(this string sourceString)
{
// Normalise breaks, so that they are all \r\n
var normalisedString = sourceString.NormaliseLineBreaks();
// Replace multiple spaces and tabs with a single space
var singleSpacedString = string.Join(" ", normalisedString.Split(new[] { " ", "\t" }, StringSplitOptions.RemoveEmptyEntries));
// Trim all of the sub-strings between breaks - this will also empty any whitespace between breaks
var trimmedString = string.Join("\r\n",
singleSpacedString.Split(new[] { "\r\n" }, StringSplitOptions.None)
.Select(s => s.Trim()));
// The logic requires that the user can use one or two carriage returns, which difficult to achieve by splitting and re-joining.
// Replace the double and single carriage returns with respective Guids
var guidNotationString = trimmedString.Replace("\r\n\r\n", DoubleBreakGuid).Replace("\r\n", SingleBreakGuid);
// Merge trailing DoubleBreakGuid with trailing SingleBreakGuid into just a DoubleBreakGuid.
var includesTripleBreaks = guidNotationString.Replace(DoubleBreakGuid + SingleBreakGuid, DoubleBreakGuid);
// Replace groups of DoubleBreakGuid with a double break
var includesDoubleBreaks = string.Join("\r\n\r\n",
includesTripleBreaks.Split(new[] { DoubleBreakGuid }, StringSplitOptions.RemoveEmptyEntries));
// Replace groups of SingleBreakGuid with single breaks
var includesSingleBreaks = string.Join("\r\n",
includesDoubleBreaks.Split(new[] { SingleBreakGuid }, StringSplitOptions.RemoveEmptyEntries));
return includesSingleBreaks;
}
public static string NormaliseLineBreaks(this string sourceString)
{
return sourceString
.Replace("\r\n", "\n")
.Replace("\n\r", "\n")
.Replace("\r", "\n")
.Replace("\n", "\r\n");
}
}
答案 0 :(得分:1)
您可以使用迭代方法将多个换行符减少到仅两个换行符。而不是奇怪的guid替换,使用类似的东西:
var collapsedString = trimmedString.Replace("\r\n\r\n\r\n","\r\n\r\n");
while(collapsedString.Length < trimmedString.Length)
{
trimmedString = collapsedString;
collapsedString = trimmedString.Replace("\r\n\r\n\r\n","\r\n\r\n");
}
答案 1 :(得分:1)
您的代码包含许多替换...,每个替换需要迭代整个输入字符串并根据匹配条件创建新字符串。
在这里,我编写了一个只循环一次的代码,并根据需要跳过重复的空格(' ' , '\t'
)和换行符('\r', '\n', '\r\n', '\n\r'
):
必须注意的是,如果我们有一个已知的换行符,代码可能会更简单。但我在这里的代码中没有使用NormaliseLineBreaks
func。
public static class StringHelper
{
public static string RemoveExtraWhiteSpace(this string s)
{
int n = s.Length;
StringBuilder sb = new StringBuilder(n); //to make output
int nLineBreaks = 2; //number of repetitive line breaks, assume there were 2 enter chars before begining of s (to avoid adding initial line breaks or spaces)
bool prevCharWasCrLf = false; //we can't use nEneter for this purpose as it skip white spaces between line breaks
char ch1, ch = '\0'; //ch1 is prev char, ch is current char
for (int i = 0; i < n; i++) //iterate through chars
{
ch1 = ch; ch = s[i]; //get next char
if (ch == '\r' || ch == '\n')
{
if (prevCharWasCrLf && ch != ch1) { prevCharWasCrLf = false; continue; } //this char is second of CrLf pair, ignore it as we already treat it
//if (prevCharWasCrLf == false || ch == ch1) /if we prefer don't use continue
prevCharWasCrLf = true;
nLineBreaks++;
if (nLineBreaks <= 2) //append new line break if we have less than 2
{
if (sb.Length > 0 && sb[sb.Length - 1] == ' ') sb.Length--; //remove prev space as it was before an enter
sb.Append("\r\n");
}
}
else
{
if (ch == ' ' || ch == '\t')
{
if (nLineBreaks == 0 && ch1 != ' ' && ch1 != '\t') sb.Append(' '); //don't add more space after another space or enter
}
else
{
nLineBreaks = 0; sb.Append(ch); //its a normal char, add it to output
}
prevCharWasCrLf = false;
}
}
return sb.ToString().TrimEnd('\r', '\n'); //if we don't use nReturn = 2 at begining, we shall run: .Trim('\r', '\n', ' ', '\t');
}
}
可能需要一些微调,但它在我的测试中有效......
另外,我知道这不是一个简短的代码,但在我看来,它更干净,更重要:它有更好的性能!
答案 2 :(得分:0)
如果您关心性能,请尝试使用我的other answer,但如果您正在寻找更简单易懂的代码,那么这就是您的答案:
public string RemoveExtensiveWhiteSpace(string s)
{
s = Regex.Replace(s, @"\r\n|\n\r|\n|\r", "\r\n"); //normalize all type of line breaks to \r\n
s = Regex.Replace(s, @"[ \t]+", " "); // \t+|[\t ]{2,}
s = s.Replace("\r\n ", "\r\n").Replace(" \r\n", "\r\n"); //Regex.Replace(s, @"(\r\n | \n\r)", "\r\n")
s = Regex.Replace(s, @"(\r\n){2,}", "\r\n\r\n"); //replace 2+ new line breaks with 2
return s.Trim('\r', '\n', ' '); //remove initial & final white space chars
}