显然,Trim的主要用途是从字符串中删除前导和结束空格,如:
" hello ".Trim(); // results in "hello"
但Trim还删除了\n
,\r
和\t
等额外字符,因此:
" \nhello\r\t ".Trim(); // it also produces "hello"
是否有明确的所有字符列表(最好采用字符串转义格式,如\n
)Trim
将删除?
编辑:感谢您的详细解答 - 我现在知道EXACT字符了。这个Wikipedia list that @RayKoopa left in comments可能是最适合我的格式。
答案 0 :(得分:8)
我们可以查看String
类here
public Trim()
方法调用名为TrimHelper()
的内部帮助程序方法:
public String Trim() {
Contract.Ensures(Contract.Result<String>() != null);
Contract.EndContractBlock();
return TrimHelper(TrimBoth);
}
TrimHelper()
看起来像这样:
[System.Security.SecuritySafeCritical] // auto-generated
private String TrimHelper(int trimType) {
//end will point to the first non-trimmed character on the right
//start will point to the first non-trimmed character on the Left
int end = this.Length-1;
int start=0;
//Trim specified characters.
if (trimType !=TrimTail) {
for (start=0; start < this.Length; start++) {
if (!Char.IsWhiteSpace(this[start]) && !IsBOMWhitespace(this[start])) break;
}
}
if (trimType !=TrimHead) {
for (end= Length -1; end >= start; end--) {
if (!Char.IsWhiteSpace(this[end]) && !IsBOMWhitespace(this[start])) break;
}
}
return CreateTrimmedString(start, end);
}
因此,您的大部分问题基本上都在检查Char.IsWhiteSpace
方法,
[Pure]
public static bool IsWhiteSpace(char c) {
if (IsLatin1(c)) {
return (IsWhiteSpaceLatin1(c));
}
return CharUnicodeInfo.IsWhiteSpace(c);
}
如果它是拉丁字符,那么这就构成了空格:
private static bool IsWhiteSpaceLatin1(char c) {
// There are characters which belong to UnicodeCategory.Control but are considered as white spaces.
// We use code point comparisons for these characters here as a temporary fix.
// U+0009 = <control> HORIZONTAL TAB
// U+000a = <control> LINE FEED
// U+000b = <control> VERTICAL TAB
// U+000c = <contorl> FORM FEED
// U+000d = <control> CARRIAGE RETURN
// U+0085 = <control> NEXT LINE
// U+00a0 = NO-BREAK SPACE
if ((c == ' ') || (c >= '\x0009' && c <= '\x000d') || c == '\x00a0' || c == '\x0085') {
return (true);
}
return (false);
}
否则我们必须转到CharUnicodeInfo.cs
,它使用枚举来检查空格字符
internal static bool IsWhiteSpace(char c)
{
UnicodeCategory uc = GetUnicodeCategory(c);
// In Unicode 3.0, U+2028 is the only character which is under the category "LineSeparator".
// And U+2029 is th eonly character which is under the category "ParagraphSeparator".
switch (uc) {
case (UnicodeCategory.SpaceSeparator):
case (UnicodeCategory.LineSeparator):
case (UnicodeCategory.ParagraphSeparator):
return (true);
}
return (false);
}
答案 1 :(得分:3)
您可以自己创建
var spaces = string.Join(",", Enumerable.Range(0, 0x10000)
.Select(i => ((char)i))
.Where(c => char.IsWhiteSpace(c))
.Select(x => "'\\x" + Convert.ToInt16(x).ToString("x4") + "'"));
Console.WriteLine(spaces);
答案 2 :(得分:2)
Trim
(不带参数)会移除IsWhiteSpace
返回true
的字符:
空格字符是以下Unicode字符:
SpaceSeparator类别的成员,包括字符SPACE(U + 0020),NO-BREAK SPACE(U + 00A0),OGHAM SPACE MARK(U + 1680),EN QUAD(U + 2000), EM QUAD(U + 2001),EN SPACE(U + 2002),EM SPACE(U + 2003),三维空间(U + 2004),四个空间(U + 2005),六 - PER-EM SPACE(U + 2006),图形空间(U + 2007),PUNCTUATION SPACE(U + 2008),THIN SPACE(U + 2009),头发空间(U + 200A),NARROW NO-BREAK SPACE(U + 202F),MEDIUM MATHEMATICAL SPACE(U + 205F)和IDEOGRAPHIC SPACE(U + 3000)。
LineSeparator类别的成员,仅包含LINE SEPARATOR字符(U + 2028)。
ParagraphSeparator类别的成员,仅包含PARAGRAPH SEPARATOR字符(U + 2029)。
字符CHARACTER TABULATION(U + 0009),LINE FEED(U + 000A),LINE TABULATION(U + 000B),FORM FEED(U + 000C),CARRIAGE RETURN(U + 000D)和NEXT LINE(U + 0085)。
根据http://referencesource.microsoft.com:
public static bool IsWhiteSpace(char c) { // char.IsWhiteSpace
if (IsLatin1(c)) {
return (IsWhiteSpaceLatin1(c));
}
return CharUnicodeInfo.IsWhiteSpace(c);
}
private static bool IsWhiteSpaceLatin1(char c) {
// There are characters which belong to UnicodeCategory.Control but are considered as white spaces.
// We use code point comparisons for these characters here as a temporary fix.
// U+0009 = <control> HORIZONTAL TAB
// U+000a = <control> LINE FEED
// U+000b = <control> VERTICAL TAB
// U+000c = <contorl> FORM FEED
// U+000d = <control> CARRIAGE RETURN
// U+0085 = <control> NEXT LINE
// U+00a0 = NO-BREAK SPACE
if ((c == ' ') || (c >= '\x0009' && c <= '\x000d') || c == '\x00a0' || c == '\x0085') {
return (true);
}
return (false);
}
internal static bool IsWhiteSpace(char c) // CharUnicodeInfo.IsWhiteSpace
{
UnicodeCategory uc = GetUnicodeCategory(c);
// In Unicode 3.0, U+2028 is the only character which is under the category "LineSeparator".
// And U+2029 is th eonly character which is under the category "ParagraphSeparator".
switch (uc) {
case (UnicodeCategory.SpaceSeparator):
case (UnicodeCategory.LineSeparator):
case (UnicodeCategory.ParagraphSeparator):
return (true);
}
return (false);
}
自定义字符也可以Trim(params char[])
删除。
答案 3 :(得分:1)
“空格字符由Unicode标准定义.Trim()方法删除任何前导和尾随字符,它们在传递给Char.IsWhiteSpace方法时返回值为true 。“
https://msdn.microsoft.com/en-us/library/t97s7bs3(v=vs.110).aspx
我希望它有所帮助...
答案 4 :(得分:1)
我似乎无法找到Trim()将删除的所有字符的列表。但是,如果您使用
删除某些字符后没有删除https://msdn.microsoft.com/en-us/library/d4tt83f9(v=vs.110).aspx
是String.Trim方法(Char [])
您可以指定要删除的字符。
希望这有帮助
答案 5 :(得分:1)
Trim删除调用IsWhitespace时返回true的所有字符。请参阅:https://msdn.microsoft.com/en-us/library/t809ektx(v=vs.110).aspx
请注意.net 3.5 sp1及更早版本的行为略有不同:https://msdn.microsoft.com/en-us/library/t97s7bs3(v=vs.110).aspx上的来电者备注