如何排除" [" RegEx

时间:2015-08-04 09:43:17

标签: regex coldfusion coldfusion-9

我有一串邮件标题及其值。不幸的是,它是一个字符串,我想排除一些不是邮件标题的模式。

以下是我所拥有的:

Return-Path: Received: from out.ipsmtp4nec.opaltelecom.net (out.ipsmtp4nec.opaltelecom.net [62.24.202.76]) by smartermail.divtech.co.za with SMTP; Mon, 6 Jul 2015 12:59:14 +0200 X-SMTPAUTH: sailor26@tiscali.co.uk X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DSrwBOXppVPOPoVl0aAUErgmdUYIMfp3gMBgGBA4IZK4VrAYJ3V4ckhW8EKYEFTQEBAQEBAQcBAQEBQAE/HwEBIAECAoNdAQIMGzMuCgYDAQIPHw4COwoCCAEGCQESCAmICAMWCZFaoGKWHYYdhS6CTR6FCi+BFAWFXAqOLQIBhGGFJ4FfkTmHHYFvAQEIAQEBAQEBgiI+MYJLAQEB X-IPAS-Result: A2DSrwBOXppVPOPoVl0aAUErgmdUYIMfp3gMBgGBA4IZK4VrAYJ3V4ckhW8EKYEFTQEBAQEBAQcBAQEBQAE/HwEBIAECAoNdAQIMGzMuCgYDAQIPHw4COwoCCAEGCQESCAmICAMWCZFaoGKWHYYdhS6CTR6FCi+BFAWFXAqOLQIBhGGFJ4FfkTmHHYFvAQEIAQEBAQEBgiI+MYJLAQEB X-Header: TalkTalk X-IronPort-AV: E=Sophos;i=""5.15,414,1432594800""; d=""scan'208,217"";a=""693647776"" Received: from 93-86-232-227.dynamic.isp.telekom.rs (HELO smtp.tiscali.co.uk) ([93.86.232.227]) by out.ipsmtp4nec.opaltelecom.net with ESMTP; 06 Jul 2015 11:59:04 +0100 Message-ID: From: "jonjon.bracq" To: "Webtickets" , "Webtickets Highlights" , "RYA" , "www jobonyachts com ADMIN" , "RYA InBrief" , "RYA InBrief" , "Webtickets Highlights" , "Webtickets Regional Highlights" , "RYA InBrief" Subject:
=?ISO-8859-1?Q?FW=3AFrom=3Ajonjon.bracq=40yahoo.com?= Date: Thu, 26 Jun 2015 11:59:43 +0000 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_00BE_8320AA74.4FC1860E" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3522.110 X-MIMEOLE: Produced By Microsoft MimeOLE V16.4.3522.110 X-SmarterMail-Spam: SPF_Pass, RHSBL, UCEProtect Level 1, Bayesian Filtering, ISpamAssassin 0 [raw: 0], DK_None, DKIM_None, Custom Rules [] X-SmarterMail-TotalSpamWeight: 12

我希望匹配所有标题(后跟":"后面的单词),不包括raw:位于[]括号内。这是因为raw:是X-SmarterMail-Spam标题值的一部分(朝向列表的末尾)。我不想删除" raw:"手动,因为将来可能还有其他此类值。

表达式/(\D[a-z\-]*)(\:)+/ig包括" raw:"。

注意:我已添加\D,以便我也可以排除时间(11:59:43),但我似乎无法排除&#34 ;原:&#34 ;.请帮忙。

3 个答案:

答案 0 :(得分:1)

这是我的最终代码。我知道有一些行需要删除,但我把它们留在那里是因为它们不会在执行时增加太多的开销。

\r\n

返回<cffunction name="GetHeader" output="yes" returntype="string"> <cfargument name="header" required="yes" type="string"> <cfargument name="property" required="yes" type="string"> <cfset return = ""> <cfset propFinderPos = REFind(property & ":",header) > <cfif propFinderPos GT 0> <cfset propValueStart = propFinderPos + LEN(property) + 1 > <cfset propNextPos = REFind("(\D[A-Za-z\-]*)(\:)",header,propValueStart,"TRUE") > <cfif propNextPos.pos[1] GT 0 > <!--- test to see if there is no "[" ---> <cfif Mid(header,propNextPos.pos[1],1) EQ "["> <cfset propNextPos = REFind("(\D[A-Za-z\-]*)(\:)",header,propNextPos.pos[1]+propNextPos.len[1],"TRUE") > </cfif> <cfset propValueEnd = propNextPos.pos[1] > <cfelse> <cfset propValueEnd = LEN(header) > </cfif> <cfset header2 = Mid(header,1,propValueEnd)> <cfset return = Mid(header2, propValueStart, propValueEnd)> <cfelse> <cfset return = "~not found~" > </cfif> <cfreturn return > </cffunction> <cfoutput> X-SmarterMail-Spam = #GetHeader(header,"X-SmarterMail-Spam")# </cfoutput>

答案 1 :(得分:0)

raw:是一个语法上有效的标题名称,因此您必须添加上下文才能将其单独输出。由于它的出现似乎是一个罕见的例外,我建议不要在比赛中满足它,而是在后续处理过滤它。

但是,如果要将其保留在正则表达式中,请排除左括号并确保匹配完整的标题字符串。谨防使用\D启动正则表达式,因为这是一个太松散的条件(例如,它也会匹配开头括号......):

([^\[a-z_0-9\-]|^)([a-z_\-][a-z0-9_\-]*:)/ig

正则表达式在Regex 101处针对您的示例输入进行了检查。

答案 2 :(得分:0)

这更像是解析电子邮件标题的通用解决方案,但只是抛出另一种可能性......如果您的标题字符串由新行分隔,就像在典型的电子邮件中一样:

...
X-Priority: 3 
X-MSMail-Priority: Normal Importance: Normal 
X-SmarterMail-Spam: SPF_Pass, RHSBL, UCEProtect Level 1, Bayesian Filtering, ISpamAssassin 0 [raw: 0], DK_None, DKIM_None, Custom Rules [] 
...

您可以使用核心javax.mail.internet.InternetHeaders类进行解析。与您当前的正则表达式不同,此类专门用于解析RFC822标头(即电子邮件标头)。要使用它,请从分隔的字符串创建一个InputStream并将其加载到headers对象中:

// create stream from delimited string
stream = createObject("java", "java.io.StringBufferInputStream").init( yourString );
// load stream and extract all headers
mimeHeaders = createObject("java", "javax.mail.internet.InternetHeaders");
mimeHeaders.load( stream );

加载字符串后,您可以从实例中获取所需的任何标头。例如,要检索&#34; X-SmarterMail-Spam&#34;头:

headers = mimeHeaders.getHeader("X-SmarterMail-Spam");
if (!isNull(headers)) {
    writeDump(headers);
}

NB: 某些标头可能会多次出现,因此此方法会返回一个数组,如果标头不存在,则返回null。