Question

我有一个.msg outlook文件我打开并需要从中提取一些特定数据。我对正则表达式还有点新意，我很难找到我需要的东西。

下面是文件中的数据，它包含一些看似fyi的标签：

NEWS ID:    918273/1
TITLE:  News Platform Solution Overview (CNN) (US English Session)
ACCOUNT:    supernewsplatformacct (55712)

Your request has been completed.

Output Format   MP4

Please click on the "Download File" link below to access the download page.

Download File <http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4>

我需要：

918273 -from- NEWS ID: 918273/1

News Platform Solution Overview (CNN) (US English Session) -from- TITLE: News Platform Solution Overview (CNN) (US English Session)

supernewsplatformacct -from- ACCOUNT: supernewsplatformacct (55712)

http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4 -from- Download File <http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4>

我正在尝试

[\n\r][ \t]*NEWS ID:[ \t]*([^\n\r]*)

但没有运气。任何帮助将不胜感激！

Answer 1

(?:^|(?<=\n))[^:<\n]*[:<](.*)

您可以将其与re.findall一起使用。请参阅演示。

https://regex101.com/r/d7RPNB/2

Answer 2

msg = """NEWS ID:    918273/1
TITLE:  News Platform Solution Overview (CNN) (US English Session)
ACCOUNT:    supernewsplatformacct (55712)

Your request has been completed.

Output Format   MP4

Please click on the "Download File" link below to access the download page.

Download File <http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4>"""
import re
regex = r'[^:]+:\s+(.*)$|[^<]+<([^>]+)>'
matches = [re.match(regex, i).group(1) or re.match(regex, i).group(2) for i in msg.split('\n') if i and re.match(regex, i)]
print(matches)

REGEX - （使用Python 3.5） - 在文件中查找字符串

2 个答案: