Question

从字符串的开头和结尾剪切所有非字母数字字符的最佳方法是什么？我试图添加我不需要手动的字符，但它不能很好地使用。我只需要修剪任何不是字母数字的东西。

我尝试使用此功能：

   string something = "()&*1@^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
   string somethingNew = Regex.Replace(something, @"[^\p{L}-\s]+", "");

但是它会从字符串中删除所有非字母数字字符。我基本上想要的是这样的：

"test1" -> test1
#!@!2test# -> 2test
(test3) -> test3
@@test4---- -> test4

我确实想支持unicode字符，但不支持符号..

编辑：示例的输出应为：

Littering aaaannnndóú

此致

Answer 1

@"[^\p{L}\s-]+(test\d*)|(test\d*)[^\p{L}\s-]+","$1“

Answer 2

您可以在String.Trim Method (Char[])库中使用字符串函数.NET来修剪给定字符串中不必要的字符。

来自MSDN： String.Trim Method (Char[])

删除一组字符的所有前导和尾随匹配项在当前String对象的数组中指定。

在修剪不需要的字符之前，您需要首先确定字符是Letter还是Digit，如果字符是非字母数字，则可以使用String.Trim Method (Char[])函数将其删除。

你需要使用Char.IsLetterOrDigit（）函数来识别字符是否为字母数字。

来自MSDN： Char.IsLetterOrDigit()

指示Unicode字符是分类为字母还是十进制数字。

试试这个：

string str = "()&*1@^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
foreach (char ch in str)
{
    if (!char.IsLetterOrDigit(ch))
        str = str.Trim(ch);
}

<强>输出：

1@^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9

Answer 3

如果您需要删除任何不是字母数字的字符，可以使用IsLetterOrDigit与Where配对来浏览每个字符。由于我们的工作时间char，我们最后需要一点Concat才能将所有内容重新带回string。

string result = string.Concat(input.Where(char.IsLetterOrDigit));

您可以轻松转换为扩展方法

public static class Extensions
{
    public static string ToAlphaNum(this string input)
    {
        return string.Concat(input.Where(char.IsLetterOrDigit));
    }
}

您可以这样使用：

string testString = "#!@!\"(test123)\"";
string result = testString.ToAlphaNum(); //test123

注意：这将删除字符串中的每个非字母数字字符，如果您确实需要删除仅那些开头/结尾字符，请添加有关定义开头或结尾的详细信息并添加更多示例。

Answer 4

假设您想要从字符串的开头和结尾修剪非字母数字字符：

s = new string(s.SkipWhile(c => !char.IsLetterOrDigit(c))
                .TakeWhile(char.IsLetterOrDigit)
                .ToArray());

Answer 5

您还可以在行的开头和/或结尾替换所有非字母/数字：

^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$

用作

 resultString = Regex.Replace(subjectString, @"^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$", "", RegexOptions.Multiline);

如果你真的只想删除“字符串”开头和结尾的字符而不是逐行删除字符，那么在linebreak选项中删除^ $ match（RegexOption.Multiline）

如果要包含前导或尾随下划线，作为要保留的字符，可以将正则表达式简化为：

^\W+|\W+$

正则表达式的核心：

[^\p{L}\p{N}]

是一个否定的字符类，其中包含Unicode类别中的所有字符 \ p {L} 或数字 \ p {N}

换句话说：

修剪非unicode字母数字字符

^[^\p{L}\p{N}]*|[^\p{L}\p{N}]*$

Options: Case sensitive; Exact spacing; Dot doesn't match line breaks; ^$ match at line breaks; Parentheses capture

Match this alternative «^[^\p{L}\p{N}]*»
   Assert position at the beginning of a line «^»
   Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      A character from the Unicode category “letter” «\p{L}»
      A character from the Unicode category “number” «\p{N}»
Or match this alternative «[^\p{L}\p{N}]*$»
   Match any single character NOT present in the list below «[^\p{L}\p{N}]*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      A character from the Unicode category “letter” «\p{L}»
      A character from the Unicode category “number” «\p{N}»
   Assert position at the end of a line «$»

使用RegexBuddy

创建

Answer 6

不使用正则表达式：在Java中，您可以这样做:(在c＃语法中，具有相同功能的语法几乎相同）

while (true) {
    if (word.length() == 0) {
        return ""; // bad
    }

    if (!Character.isLetter(word.charAt(0))) {
        word = word.substring(1);
        continue; // so we are doing front first
    }
    if (!Character.isLetter(word.charAt(word.length()-1))) {
        word = word.substring(0, word.length()-1);
        continue; // then we are doing end
    }
    break; // if front is done, and end is done
}

Answer 7

你可以使用这种模式

^[^[:alnum:]]+|[^[:alnum:]]+$

g选项 Demo

从字符串的开头和结尾修剪非字母

7 个答案: