我有一个.msg outlook文件我打开并需要从中提取一些特定数据。我对正则表达式还有点新意,我很难找到我需要的东西。
下面是文件中的数据,它包含一些看似fyi的标签:
NEWS ID: 918273/1
TITLE: News Platform Solution Overview (CNN) (US English Session)
ACCOUNT: supernewsplatformacct (55712)
Your request has been completed.
Output Format MP4
Please click on the "Download File" link below to access the download page.
Download File <http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4>
我需要:
918273
-from- NEWS ID: 918273/1
News Platform Solution Overview (CNN) (US English Session)
-from- TITLE: News Platform Solution Overview (CNN) (US English Session)
supernewsplatformacct
-from- ACCOUNT: supernewsplatformacct (55712)
http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4
-from- Download File <http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4>
我正在尝试
[\n\r][ \t]*NEWS ID:[ \t]*([^\n\r]*)
但没有运气。任何帮助将不胜感激!
答案 0 :(得分:2)
答案 1 :(得分:0)
msg = """NEWS ID: 918273/1
TITLE: News Platform Solution Overview (CNN) (US English Session)
ACCOUNT: supernewsplatformacct (55712)
Your request has been completed.
Output Format MP4
Please click on the "Download File" link below to access the download page.
Download File <http://news.downloadwebsitefake.com/newsid/file1294757493292848575.mp4>"""
import re
regex = r'[^:]+:\s+(.*)$|[^<]+<([^>]+)>'
matches = [re.match(regex, i).group(1) or re.match(regex, i).group(2) for i in msg.split('\n') if i and re.match(regex, i)]
print(matches)