我想从电子邮件中过滤掉自动文本。这些是这样的行:
如果您收到此电子邮件错误,请发送回给我们 立即 \ r \ n 并将其永久删除,请勿使用,复制或 披露电子邮件或任何附件的内容。
为此,我创建了这些句子的列表,并将其过滤为:
def remove_redundant_text(body):
for i in filter_lists.body_filter_list:
body = body.replace(i, "")
return body
但是,这是行不通的,因为换行符和其他转义字符会随机出现在文本中,例如示例中的 \ r \ n 。如何使.replace()
忽略这些?
让我给出输入和所需的输出示例。
input = {'description': "\n\nYes have tried this along with all other combinations but nothing working – just said to contact helpdesk with issues?\n\xa0\n\xa0\n\n\n\n\n\nKirstin Box\n\n\n\n\nSales Force Effectiveness – Wholesale, Workplace, Institutions & Leisure\n\n\n\n\nE. \n\n\n\xa0\n\n\n\xa0\n\n\n\n\nM. \n\n\n\xa0\n\n\n\xa0\n\n\n\n\n\xa0\n\n\n\xa0\n\n\n\xa0\n\n\n\n\n\n\n\n\xa0\n\n\n\xa0\n\n\n\n\n\xa0\n\n\n\xa0\n\n\n\xa0\n\n\n\n\nWe work flexibly at Coca-Cola European Partners. I'm sending this message now because it suits me, but I don't expect you to read, respond or action it outside\r\n of your regular hours.\n\xa0\nCustomer HUB Phone: 0808 1 000 000\nCustomer HUB Email:\r\nconnect@ccep.com\nCustomer HUB Website:\r\nwww.cokecustomerhub.co.uk\n\xa0\nThe information in this email (including any attachments) is intended solely for the addressee(s) and is confidential. It may be read, copied and used only by the\r\n intended recipient. If you receive this email in error, please send it back to us immediately and permanently delete it and do not use, copy or disclose the content of the email or any attachment. Subject to national laws, Coca-Cola European Partners may process\r\n and monitor email content and traffic data for the purposes of security and compliance with corporate policies and applicable laws.\n\xa0\nPLEASE RESPECT THE ENVIRONMENT: Think twice before printing this e-mail.\n\n\n\n\n\xa0\n\n\xa0\n\n\nFrom: BPT Service Desk\r\n\nSent: 26 June 2019 13:15\nTo: Kirstin Box <....>\nSubject: RE: Internet Access\n\n\n\xa0\nHello, Kirstin.\n\xa0\n\xa0\nDid you try the combination\xa0\r\nbxxxxxx@cokecce.com ?\n\xa0\n\xa0\nBest Regards"}
输出= {'description': "\n\nYes have tried this along with all other combinations but nothing working – just said to contact helpdesk with issues\n\nFrom: BPT Service Desk\r\n\nSent: 26 June 2019 13:15\nTo: Kirstin Box <....>\nSubject: RE: Internet Access\n\n\n\xa0\nHello, Kirstin.\n\xa0\n\xa0\nDid you try the combination\xa0\r\nbxxxxxx@cokecce.com ?\n\xa0\n\xa0\nBest Regards"}
body_filter_list = ["We work flexibly at Coca-Cola European Partners. I'm sending this message now because it suits me, but I don't expect you to read, respond or action it outside of your regular hours.",
"The information in this email (including any attachments) is intended solely for the addressee(s) and is confidential. It may be read, copied and used only by the intended recipient.",
"If you receive this email in error, please send it back to us immediately and permanently delete it and do not use, copy or disclose the content of the email or any attachment. ",
"Subject to national laws, Coca-Cola European Partners may process and monitor email\r\n content and traffic data for the purposes of security and compliance with corporate policies and applicable laws.",
"Customer HUB Phone: 0808 1 000 000\nCustomer HUB Email:\r\nconnect@ccep.com\nCustomer HUB Website:\r\nwww.cokecustomerhub.co.uk",
"The information in this email (including any attachments) is intended solely for the addressee(s) and is confidential. It may be read, copied and used only by the\r\n intended recipient. If you receive this email in error, please send it back to us immediately and permanently delete it and do not use, copy or disclose the content of the email or any attachment. Subject to national laws, Coca-Cola European Partners may process\r\n and monitor email content and traffic data for the purposes of security and compliance with corporate policies and applicable laws.",
"PLEASE RESPECT THE ENVIRONMENT: Think twice before printing this e-mail.",
"Este correo electrónico ha sido enviado en nombre del grupo de empresas de Coca-Cola European Partners.\r\nPulse en el siguiente enlace para ver esta leyenda informativa en English, Français, Nederlands, Norsk, Svenska, Deutsch, Español and Português.\n\r\nLa información contenida en este correo electrónico (incluidos los archivos adjuntos) está destinada exclusivamente a su destinatario (s) y es confidencial. Puede ser leída, copiada y utilizada solamente por su destinatario. Si recibe este mensaje por error,\r\n por favor, envíelo de nuevo, inmediatamente, al remitente, elimínelo permanentemente y no utilice, copie o divulgue el contenido del correo electrónico ni de cualquier archivo adjunto.\n\r\nSiempre de conformidad con la legislación nacional aplicable, las empresas de Coca-Cola European Partners, podrán procesar y monitorizar el contenido de correo electrónico y del tráfico de datos con fines de seguridad y cumplimiento de las políticas corporativas\r\n y de la normativa aplicable.\n\r\nPOR FAVOR RESPETE EL MEDIO AMBIENTE: reconsidere la necesidad de imprimir este correo electrónico antes de hacerlo. La protección medioambiental es responsabilidad de todos.",
"This email was sent on behalf of the Coca-Cola European Partners group of companies.",
"Click here to see our email disclaimer in English, Français, Nederlands, Norsk, Svenska, Deutsch, Español and Português.",
"The information in this email (including any attachments) is intended solely for the addressee(s) and is confidential. It may be read, copied and used only by the intended recipient. If you receive this email in error, please send it back to us immediately\r\n and permanently delete it and do not use, copy or disclose the content of the email or any attachment.\n\r\nSubject to national laws, Coca-Cola European Partners may process and monitor email content and traffic data for the purposes of security and compliance with corporate policies and applicable laws.\n\r\nPLEASE RESPECT THE ENVIRONMENT: Think twice before printing this e-mail. Environmental protection is in our hands."]
答案 0 :(得分:2)
我尝试了以下代码,它可以按预期工作。
完整代码:
body = (
"If you receive this email in error, please send it back "
"to us immediately \r\n and permanently delete it and do not "
"use, copy or disclose the content of the email or any attachment."
)
def remove_redundant_text(body):
for i in ["\n", "\r"]:
body = body.replace(i, "")
return body
print(remove_redundant_text(body))
输出:
>>> python3 test.py
If you receive this email in error, please send it back to us immediately and permanently delete it and do not use, copy or disclose the content of the email or any attachment.
一个更有效的解决方案是正则表达式。您可以使用
re.sub
。当你 可以在下面看到,您可以使用 正则表达式。
代码:
import re
body = (
"If you receive this email in error, please send it back "
"to us immediately \r\n and permanently delete it and do not "
"use, copy or disclose the content of the email or any attachment."
)
print(re.sub("\r|\n", "", body))
输出:
>>> python3 test.py
If you receive this email in error, please send it back to us immediately and permanently delete it and do not use, copy or disclose the content of the email or any attachment.