Question

鉴于一个电子邮件主题行，我想清理它，摆脱“Re：”，“Fwd”和其他垃圾。所以，例如，“[Fwd] Re：杰克和吉尔的婚礼”应该变成“杰克和吉尔的婚礼”。

之前有人必须这样做，所以我希望你能指出我对经过测试的正则表达式或代码进行战斗。

以下是this page上需要清理的一些示例。该页面上的正则表达式工作得相当好，但并不完全存在。

Fwd : Re : Re: Many
Re : Re: Many
Re  : : Re: Many
Re:: Many
Re; Many
: noah - should not match anything
RE--
RE: : Presidential Ballots for Florida
[RE: (no subject)]
Request - should not match anything
this is the subject (fwd)
Re: [Fwd: ] Blonde Joke
Re: [Fwd: [Fwd: FW: Policy]]
Re: Fwd: [Fwd: FW: "Drink Plenty of Water"]
FW: FW: (fwd) FW:  Warning from XYZ...
FW: (Fwd) (Fwd) 
Fwd: [Fwd: [Fwd: Big, Bad Surf Moving]]
FW: [Fwd: Fw: drawing by a school age child in PA (fwd)]
Re: Fwd

Answer 1

试试这个（替换为''）：

/([\[\(] *)?(RE|FWD?) *([-:;)\]][ :;\])-]*|$)|\]+ *$/igm

（如果您将每个主题作为自己的字符串放置，那么您不需要m修饰符;这只是为了使$匹配行尾，而不仅仅是字符串的结尾，多行字符串输入）。

在行动here中查看。

正则表达式的解释：

([\[\(] *)?            # starting [ or (, followed by optional spaces
(RE|FWD?) *            # RE or FW or FWD, followed by optional spaces
([-:;)\]][ :;\])-]*|$) # only count it as a Re or FWD if it is followed by 
                       # : or - or ; or ] or ) or end of line
                       # (and after that you can have more of these symbols with
                       #  spaces in between)
|                      # OR
\]+ *$                 # match any trailing \] at end of line 
                       # (we assume the brackets () occur around a whole Re/Fwd
                       #  but the square brackets [] occur around the whole 
                       #  subject line)

标志。

i：不区分大小写。

g：全局匹配（匹配您可以找到的所有Re / Fwd）。

m：让正则表达式中的'$'匹配多行输入的行末，而不仅仅是字符串的结尾（仅当您将所有输入主题一次输入正则表达式时才相关。如果您每次输入一个主题，然后你可以删除它，因为行尾是字符串的结尾）。

Answer 2

根据国家/地区/语言的多种变体（主题前缀）：Wikipedia: List of email subject abbreviations

巴西：RES === RE，德语：AW === RE

Python中的示例：

#!/usr/local/bin/python
# -*- coding: utf-8 -*-
import re
p = re.compile( '([\[\(] *)?(RE?S?|FYI|RIF|I|FS|VB|RV|ENC|ODP|PD|YNT|ILT|SV|VS|VL|AW|WG|ΑΠ|ΣΧΕΤ|ΠΡΘ|תגובה|הועבר|主题|转发|FWD?) *([-:;)\]][ :;\])-]*|$)|\]+ *$', re.IGNORECASE)
print p.sub( '', 'RE: Tagon8 Inc.').strip()

PHP中的示例：

$subject = "主题: Tagon8 - test php";
$subject = preg_replace("/([\[\(] *)?(RE?S?|FYI|RIF|I|FS|VB|RV|ENC|ODP|PD|YNT|ILT|SV|VS|VL|AW|WG|ΑΠ|ΣΧΕΤ|ΠΡΘ|תגובה|הועבר|主题|转发|FWD?) *([-:;)\]][ :;\])-]*|$)|\]+ *$/im", '', $subject);
var_dump(trim($subject));

终端：

$ python test.py
Tagon8 Inc.
$ php test.php
string(17) "Tagon8 - test php"

注意：这是mathematical.coffee的正则表达式。添加了其他语言的其他前缀：中文，丹麦语挪威语，芬兰语，法语，德语，希腊语，希伯来语，意大利语，冰岛语，瑞典语，葡萄牙语，波兰语，土耳其语

我使用“strip / trim”删除空格

Answer 3

以下正则表达式将按照我期望的方式匹配所有案例。我不确定你是否会同意，因为不是每个案例都有明确记录。几乎可以肯定的是简化这一点，但它很有用：

/^((\[(re|fw(d)?)\s*\]|[\[]?(re|fw(d)?))\s*[\:\;]\s*([\]]\s?)*|\(fw(d)?\)\s*)*([^\[\]]*)[\]]*/i

比赛的最终结果将是被剥离的主题。

用于从电子邮件主题中删除“FWD”，“RE”等的正则表达式/代码

3 个答案: