Question

我正在尝试编写一些正则表达式，以使我可以对捕获组进行隐式查找，以便可以从电子邮件中提取可能的引用。我需要知道如何从某个角度看向第一个空白。如果找到数字，我不希望提取引用。

我已经达到了如下所示。我有2个捕获组-'PreRef'和'Ref'。如果'PreRef'包含数字，我不希望找到'Ref'匹配项。到目前为止，我只检查冒号前面的字符是否为数字。

(?<PreRef>\S+)(?<![\d]):(?<Ref>\d{5})

在此处找到“引用”匹配项12345：

This is a reference:12345

但不是这里（“引用”一词中有5）：

This is not a ref5rence:12345

Answer 1

您可以从\S类中排除数字，然后将表达式括起来
带有空白边界，然后是中提琴..

(?<!\S)(?<PreRef>[^\s\d]+):(?<Ref>\d{5})(?!\S)

https://regex101.com/r/JrU7Kd/1

解释

 (?<! \S )                     # Whitespace boundary
 (?<PreRef> [^\s\d]+ )         # (1), Not whitespace nor digit
 :                             # Colon
 (?<Ref> \d{5} )               # (2), Five digits
 (?! \S )                      # Whitespace boundary

Answer 2

您需要背后的负面表情吗？仅从>>> x = 1.23 >>> print("%.2g" % x) 1.2 >>> x = 12.3 >>> print("%.2g" % x) 12捕获中排除数字会更容易。 PreRef将匹配单词字符，但不匹配数字。然后，您只需要添加[^\W\d]或其他类似的单词边界断言，以确保匹配的是完整单词。

\b

Answer 3

我当然同意John，如果:之前不允许数字，我们可以使用一个简单的表达式，例如：

^\D+:(\d{5})

或：

^\D+:(\d{5})$

如果我们希望添加更多边界，我们当然也可以这样做。

Demo

RegEx电路

jex.im可视化正则表达式：

测试

const regex = /^\D+:(\d{5})/gm;
const str = `This is a reference:12345
This is not a ref5rence:12345`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

捕获组中的负向后看

3 个答案:

Demo

RegEx电路

测试