REGEX - (使用Python 3.5) - 在文件中查找字符串

时间:2016-12-09 20:10:31

标签: python regex

我有一个.msg outlook文件我打开并需要从中提取一些特定数据。我对正则表达式还有点新意,我很难找到我需要的东西。

下面是文件中的数据,它包含一些看似fyi的标签:

NEWS ID:    918273/1
TITLE:  News Platform Solution Overview (CNN) (US English Session)
ACCOUNT:    supernewsplatformacct (55712)

Your request has been completed.

Output Format   MP4

Please click on the "Download File" link below to access the download page.

Download File <http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4>

我需要:

918273 -from- NEWS ID: 918273/1

News Platform Solution Overview (CNN) (US English Session) -from- TITLE: News Platform Solution Overview (CNN) (US English Session)

supernewsplatformacct -from- ACCOUNT: supernewsplatformacct (55712)

http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4 -from- Download File <http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4>

我正在尝试

[\n\r][ \t]*NEWS ID:[ \t]*([^\n\r]*)

但没有运气。任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:2)

(?:^|(?<=\n))[^:<\n]*[:<](.*)

您可以将其与re.findall一起使用。请参阅演示。

https://regex101.com/r/d7RPNB/2

答案 1 :(得分:0)

msg = """NEWS ID:    918273/1
TITLE:  News Platform Solution Overview (CNN) (US English Session)
ACCOUNT:    supernewsplatformacct (55712)

Your request has been completed.

Output Format   MP4

Please click on the "Download File" link below to access the download page.

Download File <http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4>"""
import re
regex = r'[^:]+:\s+(.*)$|[^<]+<([^>]+)>'
matches = [re.match(regex, i).group(1) or re.match(regex, i).group(2) for i in msg.split('\n') if i and re.match(regex, i)]
print(matches)