从用户输入中删除大量空白的任何更整洁的方法?

时间:2017-03-07 11:40:11

标签: c# .net

我需要创建一个从用户输入中删除大量空白区域的方法,这样就不会弄乱通过UI或报表呈现的信息格式。我希望阻止用户连续使用制表符,多个空格和两个以上的回车符。这是我目前的解决方案(效果很好),但有人有什么更整洁的吗?我的主要挑战是确保用户可以使用单回车或双回车:

public static class StringHelper
{
    private static readonly string SingleBreakGuid = Guid.NewGuid().ToString();
    private static readonly string DoubleBreakGuid = Guid.NewGuid().ToString();

    /// <summary>
    /// Limits character spacing to single spacing.  Limits line spacing to no more than double line spacing. 
    /// </summary>
    /// <param name="sourceString">The source string that will be used to calculate the result.</param>
    /// <returns>A string with single spacing between characters and no more than double line spacing.</returns>
    public static string RemoveExtensiveWhiteSpace(this string sourceString)
    {
        // Normalise breaks, so that they are all \r\n
        var normalisedString = sourceString.NormaliseLineBreaks();

        // Replace multiple spaces and tabs with a single space
        var singleSpacedString = string.Join(" ", normalisedString.Split(new[] { " ", "\t" }, StringSplitOptions.RemoveEmptyEntries));

        // Trim all of the sub-strings between breaks - this will also empty any whitespace between breaks
        var trimmedString = string.Join("\r\n",
            singleSpacedString.Split(new[] { "\r\n" }, StringSplitOptions.None)
            .Select(s => s.Trim()));

        // The logic requires that the user can use one or two carriage returns, which difficult to achieve by splitting and re-joining.
        // Replace the double and single carriage returns with respective Guids
        var guidNotationString = trimmedString.Replace("\r\n\r\n", DoubleBreakGuid).Replace("\r\n", SingleBreakGuid);

        // Merge trailing DoubleBreakGuid with trailing SingleBreakGuid into just a DoubleBreakGuid.
        var includesTripleBreaks = guidNotationString.Replace(DoubleBreakGuid + SingleBreakGuid, DoubleBreakGuid);

        // Replace groups of DoubleBreakGuid with a double break
        var includesDoubleBreaks = string.Join("\r\n\r\n",
            includesTripleBreaks.Split(new[] { DoubleBreakGuid }, StringSplitOptions.RemoveEmptyEntries));

        // Replace groups of SingleBreakGuid with single breaks
        var includesSingleBreaks = string.Join("\r\n",
            includesDoubleBreaks.Split(new[] { SingleBreakGuid }, StringSplitOptions.RemoveEmptyEntries));

        return includesSingleBreaks;
    }

    public static string NormaliseLineBreaks(this string sourceString)
    {
        return sourceString
            .Replace("\r\n", "\n")
            .Replace("\n\r", "\n")
            .Replace("\r", "\n")
            .Replace("\n", "\r\n");
    }
}

3 个答案:

答案 0 :(得分:1)

您可以使用迭代方法将多个换行符减少到仅两个换行符。而不是奇怪的guid替换,使用类似的东西:

var collapsedString = trimmedString.Replace("\r\n\r\n\r\n","\r\n\r\n");
while(collapsedString.Length < trimmedString.Length)
{
  trimmedString = collapsedString;
  collapsedString = trimmedString.Replace("\r\n\r\n\r\n","\r\n\r\n");
}

答案 1 :(得分:1)

您的代码包含许多替换...,每个替换需要迭代整个输入字符串并根据匹配条件创建新字符串。

在这里,我编写了一个只循环一次的代码,并根据需要跳过重复的空格(' ' , '\t')和换行符('\r', '\n', '\r\n', '\n\r'):

必须注意的是,如果我们有一个已知的换行符,代码可能会更简单。但我在这里的代码中没有使用NormaliseLineBreaks func。

public static class StringHelper
{

    public static string RemoveExtraWhiteSpace(this string s)
    {
        int n = s.Length;
        StringBuilder sb = new StringBuilder(n); //to make output
        int nLineBreaks = 2; //number of repetitive line breaks, assume there were 2 enter chars before begining of s (to avoid adding initial line breaks or spaces)
        bool prevCharWasCrLf = false; //we can't use nEneter for this purpose as it skip white spaces between line breaks
        char ch1, ch = '\0'; //ch1 is prev char, ch is current char

        for (int i = 0; i < n; i++) //iterate through chars
        {
            ch1 = ch; ch = s[i]; //get next char

            if (ch == '\r' || ch == '\n')
            {
                if (prevCharWasCrLf && ch != ch1) { prevCharWasCrLf = false; continue; } //this char is second of CrLf pair, ignore it as we already treat it
                //if (prevCharWasCrLf == false || ch == ch1) /if we prefer don't use continue
                prevCharWasCrLf = true;
                nLineBreaks++;
                if (nLineBreaks <= 2) //append new line break if we have less than 2 
                {
                    if (sb.Length > 0 && sb[sb.Length - 1] == ' ') sb.Length--;  //remove prev space as it was before an enter
                    sb.Append("\r\n");
                }
            }
            else
            {
                if (ch == ' ' || ch == '\t')
                {
                    if (nLineBreaks == 0 && ch1 != ' ' && ch1 != '\t') sb.Append(' '); //don't add more space after another space or enter
                }
                else
                {
                    nLineBreaks = 0; sb.Append(ch); //its a normal char, add it to output
                }
                prevCharWasCrLf = false;
            }
        }

        return sb.ToString().TrimEnd('\r', '\n'); //if we don't use nReturn = 2 at begining, we shall run: .Trim('\r', '\n', ' ', '\t');
    }
}

可能需要一些微调,但它在我的测试中有效......

另外,我知道这不是一个简短的代码,但在我看来,它更干净,更重要:它有更好的性能!

答案 2 :(得分:0)

如果您关心性能,请尝试使用我的other answer,但如果您正在寻找更简单易懂的代码,那么这就是您的答案:

public string RemoveExtensiveWhiteSpace(string s)
{
    s = Regex.Replace(s, @"\r\n|\n\r|\n|\r", "\r\n"); //normalize all type of line breaks to \r\n
    s = Regex.Replace(s, @"[ \t]+", " "); // \t+|[\t ]{2,}
    s = s.Replace("\r\n ", "\r\n").Replace(" \r\n", "\r\n"); //Regex.Replace(s, @"(\r\n | \n\r)", "\r\n")
    s = Regex.Replace(s, @"(\r\n){2,}", "\r\n\r\n"); //replace 2+ new line breaks with 2
    return s.Trim('\r', '\n', ' '); //remove initial & final white space chars
}