Question

我试图从一系列电子邮件中提取文本。这些邮件看起来像：

您好，
     bla bla bla。原因是：问题在解决之后得到了解决   所有

亲切的问候，

bla bla

我有一个这样的正则表达式：

With ReasonReg
    .Pattern = "(The reason for this is\s*:\s*)(\w\s*)+(?=\s*Kind regards)"
    .Global = False
    .IgnoreCase = False
End With

我的问题出现在使用数字和特殊字符（冒号和问号）的邮件中。当然，\ w与那些不匹配，但如果我尝试以下任何一种情况，我的Outlook（Office 365）就会无法响应。

.Pattern = "(The reason for this is\s*:\s*)(.*\s*)+(?=\s*Kind regards)"
.Pattern = "(The reason for this is\s*:\s*)(\w\W*\s*)+(?=\s*Kind regards)"
.Pattern = "(The reason for this is\s*:\s*)(\w[:?]*\s*)+(?=\s*Kind regards)"

Answer 1

听起来您需要匹配The reason for this is\s*:\s*和Kind regards之间的所有内容。

您可以使用[\s\S]构造来匹配任何字符，并对其应用延迟量词（*?），以便在第一个Kind regards之前匹配尽可能少的字符：

.Pattern = "(The reason for this is\s*:\s*)([\s\S]*?)(\s*Kind regards)"

请参阅regex demo

如果这些分隔符之间存在大量文本匹配，请考虑展开延迟匹配构造，例如：

(The reason for this is\s*:\s*)(\S*(?:\s(?!\s*Kind regards)\S+)*)(\s*Kind regards)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

见another demo。此\S*(?:\s(?!\s*Kind regards)\S+)*模式匹配0+非空白字符（\S*），后面跟着0+序列的1 +空格，后面没有Kind regards和1 +非空白字符。

两个静态字符串之间的文本的正则表达式

1 个答案: