Question

我有一个正则表达式：

Regex.Match(result, @"\bTop Rate\b.*?\s*\s*([\d,\.]+)", RegexOptions.IgnoreCase);

然后解析成int

topRate = int.Parse(topRateMatch.Groups[1].Value, System.Globalization.NumberStyles.AllowThousands);

示例）

Top Rate: 888,888
Output: 888888

通过使用我当前的正则表达式，我可以很好地获得 int 输出。但是，我注意到当数字之间有空格时例如，

Top Rate: 8         88,888

我只得到 8。有没有办法忽略数字之间/最高评级字母之后可能存在或不存在的任何空格？

示例）

Top Rate:                       8                      88,888
Expected output: 888888

Top Rate:                       8     88,888
Expected output: 888888

Top Rate: 8                      88,888
Expected output: 888888

Top Rate: 8 8 8,888
Expected output: 888888

Top Rate: 888,          8  88
Expected output: 888888

Answer 1

首先，匹配和捕获数字时不能跳过或省略空格，只能通过在给定字符串后提取多个匹配项来实现。但是，有一个简单的两步方法。

您可以添加 \s 以匹配任何空格，或添加 \p{Zs} 和 \t 以将任何水平空格匹配到字符类。我建议先用 \d 捕获数字，然后使用可选的非捕获组在末尾带有数字模式，以确保捕获的数字以数字开头和结尾：

\bTop Rate\b.*?(\d(?:[\d,.\s]*\d)?)

参见regex demo。请注意，重复 \s*\s* 没有意义，\s* 已经匹配零个或多个空白字符，甚至 \s* 也是多余的，因为 .*? 匹配除LF 字符尽可能少。要使其跨行匹配，请添加 RegexOptions.Singleline 选项。

详情：

\bTop Rate\b - 一个完整的词 Top Rate
.*? - 除换行符以外的任何零个或多个字符，尽可能少
(\d(?:[\d,.\s]*\d)?) - 第 1 组：
- \d - 一个数字
- (?:[\d,.\s]*\d)? - 一个可选的非捕获组，匹配零个或多个数字、,、. 或空格，然后是一个数字。

接下来，当你得到匹配时，只保留数字。

var text = "Top Rate: 8                      88,888";
var result = Regex.Match(text, @"\bTop Rate\b.*?(\d(?:[\d,.\s]*\d)?)", RegexOptions.Singleline);
if (result.Success)
{
    Console.WriteLine( new string(result.Groups[1].Value.Where(c => char.IsDigit(c)).ToArray()) );
}

参见C# demo。多重匹配：

var text = "Top Rate: 8                      88,888 and Top Rate:                       8  \n   88,888";
var results = Regex.Matches(text, @"\bTop Rate\b.*?(\d(?:[\d,.\s]*\d)?)", RegexOptions.Singleline)
        .Cast<Match>()
        .Select(x => new string(x.Groups[1].Value.Where(c => char.IsDigit(c)).ToArray()));
foreach (var s in results)
{
    Console.WriteLine( s );
}

见this C# demo。

Answer 2

类似的东西？

using System;
using System.Text.RegularExpressions;
                    
public class Program
{
  public static void Main()
  {
    string[] texts = {
      "This should Not match the Top Rate thing",
      " Top Rate    : 888,888 ",
      "Top    Rate   : 8 8 8 , 8 8 8 ",
    };
    Regex rxNonDigit = new Regex(@"\D+"); // matches 1 or more characters other than decimal digits.
    Regex rxTopRate = new Regex(@"
      ^           # match start of line, followed by
      \s*         # zero or more lead-in whitespace characters, followed by
      Top         # the literal 'Top', followed by
      \s+         # 1 or more whitespace characters,followed by
      Rate        # the literal 'Rate', followed by
      \s*         # zero or more whitespace characters, followed by
      :           # a literal colon ':', followed by
      \s*         # zero or more whitespace characters followed by
      (?<rate>    # an named (explicit) capture group, containing
        \d+       # - 1 or more decimal digits, followed by
        (         # - an unnamed group, containing
          (\s|,)+ #     - interstial whitespace or a comma, followed by
          \d+     #     - 1 or more decimal digits
        )*        #   the whole of which is repeated zero or more times
      )           # followed by
      \s*         # zero or more lead-out whitespace characters, followed by
      $           # end of line
    ", RegexOptions.IgnorePatternWhitespace|RegexOptions.ExplicitCapture );

    foreach ( string text in texts )
    {
      Match m = rxTopRate.Match(text);
      if (!m.Success)
      {
        Console.WriteLine("No Match: '{0}'", text);
      }
      else
      {
        string rawValue = m.Groups["rate"].Value;
        string cleanedValue = rxNonDigit.Replace(rawValue, "");
        Decimal value = Decimal.Parse(cleanedValue);

        Console.WriteLine(@"Matched: '{0}' >>> '{1}' >>> '{2}' >>> {3}",
          text,
          rawValue,
          cleanedValue,
          value
        );
      }
    }

  }
    
}

Answer 3

我验证并发现在 Regex 语句中稍作改动，就可以实现您的目标。

第一个：

第二个：

Answer 4

String TopRate="88,888"
for(int x=0; x<TopRate.Length;x++)
{
    if(TopRate[x]==",")
    {
       TopRate[x]="";
       break;
    }
}

正则表达式 - 忽略空格

4 个答案: