String.Trim()删除的所有字符的列表?

时间:2016-05-19 20:19:48

标签: c# .net

显然,Trim的主要用途是从字符串中删除前导和结束空格,如:

"  hello  ".Trim(); // results in "hello"

但Trim还删除了\n\r\t等额外字符,因此:

"  \nhello\r\t  ".Trim(); // it also produces "hello"

是否有明确的所有字符列表(最好采用字符串转义格式,如\nTrim将删除?

编辑:感谢您的详细解答 - 我现在知道EXACT字符了。这个Wikipedia list that @RayKoopa left in comments可能是最适合我的格式。

6 个答案:

答案 0 :(得分:8)

我们可以查看Stringhere

的源代码

public Trim()方法调用名为TrimHelper()的内部帮助程序方法:

 public String Trim() {
        Contract.Ensures(Contract.Result<String>() != null);
        Contract.EndContractBlock();

        return TrimHelper(TrimBoth);        
 }

TrimHelper()看起来像这样:

[System.Security.SecuritySafeCritical]  // auto-generated
        private String TrimHelper(int trimType) {
            //end will point to the first non-trimmed character on the right
            //start will point to the first non-trimmed character on the Left
            int end = this.Length-1;
            int start=0;

            //Trim specified characters.
            if (trimType !=TrimTail)  {
                for (start=0; start < this.Length; start++) {
                    if (!Char.IsWhiteSpace(this[start]) && !IsBOMWhitespace(this[start])) break;
                }
            }

            if (trimType !=TrimHead) {
                for (end= Length -1; end >= start;  end--) {
                    if (!Char.IsWhiteSpace(this[end])  && !IsBOMWhitespace(this[start])) break;
                }
            }

            return CreateTrimmedString(start, end);
        }

因此,您的大部分问题基本上都在检查Char.IsWhiteSpace方法,

char.cs

   [Pure]
    public static bool IsWhiteSpace(char c) {

        if (IsLatin1(c)) {
            return (IsWhiteSpaceLatin1(c));
        }
        return CharUnicodeInfo.IsWhiteSpace(c);
    }

如果它是拉丁字符,那么这就构成了空格:

 private static bool IsWhiteSpaceLatin1(char c) {

            // There are characters which belong to UnicodeCategory.Control but are considered as white spaces.
            // We use code point comparisons for these characters here as a temporary fix.

            // U+0009 = <control> HORIZONTAL TAB
            // U+000a = <control> LINE FEED
            // U+000b = <control> VERTICAL TAB
            // U+000c = <contorl> FORM FEED
            // U+000d = <control> CARRIAGE RETURN
            // U+0085 = <control> NEXT LINE
            // U+00a0 = NO-BREAK SPACE
            if ((c == ' ') || (c >= '\x0009' && c <= '\x000d') || c == '\x00a0' || c == '\x0085') {
                return (true);
            }
            return (false);
        }

否则我们必须转到CharUnicodeInfo.cs,它使用枚举来检查空格字符

   internal static bool IsWhiteSpace(char c)
        {
            UnicodeCategory uc = GetUnicodeCategory(c);
            // In Unicode 3.0, U+2028 is the only character which is under the category "LineSeparator".
            // And U+2029 is th eonly character which is under the category "ParagraphSeparator".
            switch (uc) {
                case (UnicodeCategory.SpaceSeparator):
                case (UnicodeCategory.LineSeparator):
                case (UnicodeCategory.ParagraphSeparator):
                    return (true);
            }

            return (false);
        }

答案 1 :(得分:3)

您可以自己创建

var spaces = string.Join(",", Enumerable.Range(0, 0x10000)
                              .Select(i => ((char)i))
                              .Where(c => char.IsWhiteSpace(c))
                              .Select(x => "'\\x" + Convert.ToInt16(x).ToString("x4") + "'"));


Console.WriteLine(spaces);

答案 2 :(得分:2)

Trim(不带参数)会移除IsWhiteSpace返回true的字符:

  

空格字符是以下Unicode字符:

     
      
  • SpaceSeparator类别的成员,包括字符SPACE(U + 0020),NO-BREAK SPACE(U + 00A0),OGHAM SPACE MARK(U + 1680),EN QUAD(U + 2000), EM QUAD(U + 2001),EN SPACE(U + 2002),EM SPACE(U + 2003),三维空间(U + 2004),四个空间(U + 2005),六 - PER-EM SPACE(U + 2006),图形空间(U + 2007),PUNCTUATION SPACE(U + 2008),THIN SPACE(U + 2009),头发空间(U + 200A),NARROW NO-BREAK SPACE(U + 202F),MEDIUM MATHEMATICAL SPACE(U + 205F)和IDEOGRAPHIC SPACE(U + 3000)。

  •   
  • LineSeparator类别的成员,仅包含LINE SEPARATOR字符(U + 2028)。

  •   
  • ParagraphSeparator类别的成员,仅包含PARAGRAPH SEPARATOR字符(U + 2029)。

  •   
  • 字符CHARACTER TABULATION(U + 0009),LINE FEED(U + 000A),LINE TABULATION(U + 000B),FORM FEED(U + 000C),CARRIAGE RETURN(U + 000D)和NEXT LINE(U + 0085)。

  •   

根据http://referencesource.microsoft.com

public static bool IsWhiteSpace(char c) { // char.IsWhiteSpace

    if (IsLatin1(c)) {
        return (IsWhiteSpaceLatin1(c));
    }
    return CharUnicodeInfo.IsWhiteSpace(c);
}

private static bool IsWhiteSpaceLatin1(char c) {

    // There are characters which belong to UnicodeCategory.Control but are considered as white spaces.
    // We use code point comparisons for these characters here as a temporary fix.

    // U+0009 = <control> HORIZONTAL TAB
    // U+000a = <control> LINE FEED
    // U+000b = <control> VERTICAL TAB
    // U+000c = <contorl> FORM FEED
    // U+000d = <control> CARRIAGE RETURN
    // U+0085 = <control> NEXT LINE
    // U+00a0 = NO-BREAK SPACE
    if ((c == ' ') || (c >= '\x0009' && c <= '\x000d') || c == '\x00a0' || c == '\x0085') {
        return (true);
    }
    return (false);
}

internal static bool IsWhiteSpace(char c) // CharUnicodeInfo.IsWhiteSpace
{
    UnicodeCategory uc = GetUnicodeCategory(c);
    // In Unicode 3.0, U+2028 is the only character which is under the category "LineSeparator".
    // And U+2029 is th eonly character which is under the category "ParagraphSeparator".
    switch (uc) {
        case (UnicodeCategory.SpaceSeparator):
        case (UnicodeCategory.LineSeparator):
        case (UnicodeCategory.ParagraphSeparator):
            return (true);
    }

    return (false);
}

自定义字符也可以Trim(params char[])删除。

答案 3 :(得分:1)

“空格字符由Unicode标准定义.Trim()方法删除任何前导和尾随字符,它们在传递给Char.IsWhiteSpace方法时返回值为true 。“

https://msdn.microsoft.com/en-us/library/t97s7bs3(v=vs.110).aspx

我希望它有所帮助...

答案 4 :(得分:1)

我似乎无法找到Trim()将删除的所有字符的列表。但是,如果您使用

删除某些字符后没有删除

https://msdn.microsoft.com/en-us/library/d4tt83f9(v=vs.110).aspx

是String.Trim方法(Char [])

您可以指定要删除的字符。

希望这有帮助

答案 5 :(得分:1)

Trim删除调用IsWhitespace时返回true的所有字符。请参阅:https://msdn.microsoft.com/en-us/library/t809ektx(v=vs.110).aspx

请注意.net 3.5 sp1及更早版本的行为略有不同:https://msdn.microsoft.com/en-us/library/t97s7bs3(v=vs.110).aspx上的来电者备注