我想从一些文本中提取序列。
序列以Diagnostic-Code:
开头,中间部分甚至可以是多行中的任何字符,并且末尾用空行标记(之后文本继续,但这不是所需的序列)。
这适用于开头和中间部分,但结尾发现太晚了:
(?s)Diagnostic-Code: (.+)\n\n
字符串看起来像这样:
...
Status: 5.0.0
Diagnostic-Code: X-Postfix; test.com
*this*
*should*
*be included too*
--EA7634814EFB9.1516804532/mail.example.com
Content-Description: Undelivered Message
...
---------编辑---------
谢谢你的回答@Gurman!
但java.util.regex的行为方式与regex101.com不同
Action: failed
Status: 5.1.1
Remote-MTA: dns; gmail-smtp-in.l.google.com
Diagnostic-Code: smtp; 550-5.1.1 The email account that you tried to reach does
not exist. Please try 550-5.1.1 double-checking the recipient's email
address for typos or 550-5.1.1 unnecessary spaces. Learn more at 550 5.1.1
https://support.google.com/mail/?p=NoSuchUser u11si15276978wru.314 - gsmtp
--E8A363093CEC.1520529178/proxy03.hostname.net
Content-Description: Undelivered Message
Content-Type: message/rfc822
Return-Path: <no-reply@hostname.net>
该模式与regex101上的整个多行诊断代码匹配,但java仅将第一行与第1组匹配:
smtp; 550-5.1.1 The email account that you tried to reach does
java-code:
diagnosticCodePatter = Pattern.compile("(?i)diagnostic[-| ]Code: ([\\s\\S]*?[\\r\\n]{2})");
matcher = diagnosticCodePatter.matcher(message);
if (matcher.find()) {
diagnosticCode = matcher.group(0);
答案 0 :(得分:3)
试试这个正则表达式:
Diagnostic-Code[\s\S]*?[\r\n]{2}
<强> Click for Demo 强>
不要忘记在Java前面用另一个\
转义\
。
<强>解释强>
Diagnostic-Code
- 匹配文字Diagnostic-Code
[\s\S]*?
- 尽可能少地匹配任何字符的出现次数(包括换行符)[\r\n]{2}
- 匹配2次换行符或回车符。