Question

我有一个测试列表，我试图使用正则表达式捕获数据。

以下是文字格式的示例：

(1) this is a sample string /(2) something strange /(3) another bit of text /(4) the last one/ something!/

我有一个目前正确捕获的正则表达式，但我在使它在异常情况下工作时遇到一些困难。

这是我的正则表达式

/\(?\d\d?\)([^\)]+)(\/|\z)/

不幸的是，有些数据包含如下括号：

(1) this is a sample string (1998-1999) /(2) something strange (blah) /(3) another bit of text /(4) the last one/ something!/

子串（'1998-1999）'和'（blah）'使它失败！

任何人都想关注这个问题吗？谢谢：D

Answer 1

我会试试这个：

\((\d+)\)\s+(.*?)(?=/(?:\(\d+\)|\z))

这个相当可怕的正则表达式执行以下操作：

它查找包含在括号中的一个或多个数字并捕获它们;
括号中的数字后面必须至少有一个空格字符。此空白区域被忽略（未捕获）;
使用非贪婪的通配符表达式。对于这类问题，这是（imho）使用否定字符组（例如[^/]+）的首选方式;
正向前瞻（(?=...)）表示表达式必须后跟反斜杠，然后是以下之一：
- 括在括号中的一个或多个数字;或
- 字符串终止符。

以PHP为例（您没有指定语言）：

$s = '(1) this is a sample string (1998-1999) /(2) something strange (blah) /(3) another bit of text /(4) the last one/ something!/';
preg_match_all('!\((\d+)\)\s+(.*?)(?=/(?:\(\d+\)|\z))!', $s, $matches);
print_r($matches);

输出：

Array
(
    [0] => Array
        (
            [0] => (1) this is a sample string (1998-1999) 
            [1] => (2) something strange (blah) 
            [2] => (3) another bit of text 
            [3] => (4) the last one/ something!
        )

    [1] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
            [3] => 4
        )

    [2] => Array
        (
            [0] => this is a sample string (1998-1999) 
            [1] => something strange (blah) 
            [2] => another bit of text 
            [3] => the last one/ something!
        )

)

一些注意事项：

您没有指定要捕获的内容。我已经假设了列表项目编号和文本。这可能是错误的，在这种情况下只删除那些捕获括号。无论哪种方式，你都可以获得整场比赛;
我从比赛中删除了斜线。这可能不是你的意图。再次改变捕捉以适应;
我允许项目编号的任意位数。你的版本只允许两个。如果您愿意，可以将\d+替换为\d\d?。

Answer 2

将/添加到字符串的开头，将(0)附加到字符串的末尾，然后使用模式\/\(\d+\)拆分整个字符串，并丢弃第一个和最后一个字符串空元素。

Answer 3

只要/不能出现在文本中......

 \(?\d?\d[^/]+

用于捕获编号文本列表的正则表达式

3 个答案: