我正在尝试解析如下所示的字符串。
<Report Type="Final Report" SiteName="Get Dataset" Name="Get Metadata" Description="Get Metadata" From="2019-01-16 00:00" Thru="2019-01-16 23:59" obj_device="479999" locations="69,31,">
<Objective Type="Availability">
<Goal>99.99</Goal>
<Actual>100.00</Actual>
<Compliant>Yes</Compliant>
<Errors>0</Errors>
<Checks>2880</Checks>
</Objective>
<Objective Type="Uptime">
<Goal/>
<Actual/>
<Compliant/>
<Errors>0</Errors>
<Checks>0</Checks>
</Objective>
我想使用正则表达式查找“说明”的位置,然后在引号之间输入字符串,因此我想使用'Get Metadata'
。然后,我想找到'From'的位置并得到引号之间的字符串,所以我需要这个'2019-01-16 00:00'
。最后,我想找到'Thru'的位置并得到引号之间的字符串,所以我需要这个'2019-01-16 23:59'
。我该如何使用3个单独的regex命令并将其解析为3个单独的字符串? TIA。
答案 0 :(得分:1)
此正则表达式应为您提供描述的内容,其他应相似:
'Description="([\w\s]+)" From'
答案 1 :(得分:1)
您可以使用1个正则表达式模式
pattern = re.compile('Description="(.*)" From="(.*)" Thru="(.*)" obj')
for founds in re.findall(pattern=pattern, string=string):
desc, frm, thru = founds
print(desc)
print(frm)
print(thru)
# ouput
# Get Metadata
# 2019-01-16 00:00
# 2019-01-16 23:59
或者您可以使用不同的模式执行同一步骤
pattern_desc = re.compile('Description="(.*)" From')
pattern_frm = re.compile('From="(.*)" Thru')
pattern_thru = re.compile('Thru="(.*)" obj')
re.findall(pattern_desc, string)
# output: ['Get Metadata']
re.findall(pattern_frm, string)
# output: ['2019-01-16 00:00']
re.findall(pattern_thru, string)
# output: ['2019-01-16 23:59']
答案 2 :(得分:1)
我将一个工作示例与一个正则表达式放在一起,以获取您要查找的数据。
import re
long_string = '''
<Report Type="Final Report" SiteName="Get Dataset" Name="Get Metadata" Description="Get Metadata" From="2019-01-16 00:00" Thru="2019-01-16 23:59" obj_device="479999" locations="69,31,">
<Objective Type="Availability">
<Goal>99.99</Goal>
<Actual>100.00</Actual>
<Compliant>Yes</Compliant>
<Errors>0</Errors>
<Checks>2880</Checks>
</Objective>
<Objective Type="Uptime">
<Goal/>
<Actual/>
<Compliant/>
<Errors>0</Errors>
<Checks>0</Checks>
</Objective>
'''
match = re.search('Description=\"(.+?)\" From=\"(.+?)\" Thru=\"(.+?)\"', long_string)
if match:
print(match.group(1))
print(match.group(2))
print(match.group(3))
它给出以下输出:
Get Metadata
2019-01-16 00:00
2019-01-16 23:59
希望这会有所帮助。
答案 3 :(得分:1)
您需要使用三个正则表达式来捕获上述值,
Description="([^"]*)"
From="([^"]*)"
Thru="([^"]*)"
您可以通过函数动态生成并重新使用它来查找任何类型的数据的值。试试这个python代码演示,
import re
def getValue(str, key):
m = re.search(key + '="([^"]*)"',str)
if m:
return m.group(1)
s = '''<Report Type="Final Report" SiteName="Get Dataset" Name="Get Metadata" Description="Get Metadata" From="2019-01-16 00:00" Thru="2019-01-16 23:59" obj_device="479999" locations="69,31,">
<Objective Type="Availability">
<Goal>99.99</Goal>
<Actual>100.00</Actual>
<Compliant>Yes</Compliant>
<Errors>0</Errors>
<Checks>2880</Checks>
</Objective>
<Objective Type="Uptime">
<Goal/>
<Actual/>
<Compliant/>
<Errors>0</Errors>
<Checks>0</Checks>
</Objective>'''
print('Description: ' + getValue(s,'Description'))
print('From: ' + getValue(s,'From'))
print('Thru: ' + getValue(s,'Thru'))
打印
Description: Get Metadata
From: 2019-01-16 00:00
Thru: 2019-01-16 23:59
答案 4 :(得分:0)
在纯python中,应该是这样的:
xml = '<Report Type="Final Report" SiteName="Get Dataset" Name="Get Metadata" Description="Get Metadata" From="2019-01-16 00:00" Thru="2019-01-16 23:59" obj_device="479999" locations="69,31,"><Objective Type="Availability"><Goal>99.99</Goal><Actual>100.00</Actual><Compliant>Yes</Compliant><Errors>0</Errors><Checks>2880</Checks></Objective><Objective Type="Uptime"><Goal/><Actual/><Compliant/><Errors>0</Errors><Checks>0</Checks></Objective>'
report = xml.split('>')[0]
description = report.split("Description=\"")[1].split("\" From=\"")[0]
from_ = report.split("From=\"")[1].split("\" Thru=\"")[0]
thru = report.split("Thru=\"")[1].split("\" obj_device=\"")[0]