使用Rexeg查找特定的字符串和引号之间

时间:2019-01-17 17:06:32

标签: python regex python-3.x

我正在尝试解析如下所示的字符串。

<Report Type="Final Report" SiteName="Get Dataset" Name="Get Metadata" Description="Get Metadata" From="2019-01-16 00:00" Thru="2019-01-16 23:59" obj_device="479999" locations="69,31,">
  <Objective Type="Availability">
    <Goal>99.99</Goal>
    <Actual>100.00</Actual>
    <Compliant>Yes</Compliant>
    <Errors>0</Errors>
    <Checks>2880</Checks>
    </Objective>
  <Objective Type="Uptime">
    <Goal/>
    <Actual/>
    <Compliant/>
    <Errors>0</Errors>
    <Checks>0</Checks>
  </Objective>

我想使用正则表达式查找“​​说明”的位置,然后在引号之间输入字符串,因此我想使用'Get Metadata'。然后,我想找到'From'的位置并得到引号之间的字符串,所以我需要这个'2019-01-16 00:00'。最后,我想找到'Thru'的位置并得到引号之间的字符串,所以我需要这个'2019-01-16 23:59'。我该如何使用3个单独的regex命令并将其解析为3个单独的字符串? TIA。

5 个答案:

答案 0 :(得分:1)

此正则表达式应为您提供描述的内容,其他应相似:

'Description="([\w\s]+)" From'

答案 1 :(得分:1)

  1. 您可以使用1个正则表达式模式

    pattern = re.compile('Description="(.*)" From="(.*)" Thru="(.*)" obj')
    
    for founds in re.findall(pattern=pattern, string=string):
        desc, frm, thru = founds
        print(desc)
        print(frm)
        print(thru)
    
    # ouput
    # Get Metadata
    # 2019-01-16 00:00
    # 2019-01-16 23:59
    
  2. 或者您可以使用不同的模式执行同一步骤

    pattern_desc = re.compile('Description="(.*)" From')
    pattern_frm = re.compile('From="(.*)" Thru')
    pattern_thru = re.compile('Thru="(.*)" obj')
    
    re.findall(pattern_desc, string) 
    # output: ['Get Metadata']
    
    re.findall(pattern_frm, string)
    # output: ['2019-01-16 00:00']
    
    re.findall(pattern_thru, string)
    # output: ['2019-01-16 23:59'] 
    

答案 2 :(得分:1)

我将一个工作示例与一个正则表达式放在一起,以获取您要查找的数据。

import re

long_string = '''
<Report Type="Final Report" SiteName="Get Dataset" Name="Get Metadata" Description="Get Metadata" From="2019-01-16 00:00" Thru="2019-01-16 23:59" obj_device="479999" locations="69,31,">
  <Objective Type="Availability">
    <Goal>99.99</Goal>
    <Actual>100.00</Actual>
    <Compliant>Yes</Compliant>
    <Errors>0</Errors>
    <Checks>2880</Checks>
    </Objective>
  <Objective Type="Uptime">
    <Goal/>
    <Actual/>
    <Compliant/>
    <Errors>0</Errors>
    <Checks>0</Checks>
  </Objective>
'''

match = re.search('Description=\"(.+?)\" From=\"(.+?)\" Thru=\"(.+?)\"', long_string)

if match:
    print(match.group(1))
    print(match.group(2))
    print(match.group(3))

它给出以下输出:

Get Metadata
2019-01-16 00:00
2019-01-16 23:59

希望这会有所帮助。

答案 3 :(得分:1)

您需要使用三个正则表达式来捕获上述值,

Description="([^"]*)"
From="([^"]*)"
Thru="([^"]*)"

您可以通过函数动态生成并重新使用它来查找任何类型的数据的值。试试这个python代码演示,

import re

def getValue(str, key):
 m = re.search(key + '="([^"]*)"',str)
 if m:
  return m.group(1)

s = '''<Report Type="Final Report" SiteName="Get Dataset" Name="Get Metadata" Description="Get Metadata" From="2019-01-16 00:00" Thru="2019-01-16 23:59" obj_device="479999" locations="69,31,">
  <Objective Type="Availability">
    <Goal>99.99</Goal>
    <Actual>100.00</Actual>
    <Compliant>Yes</Compliant>
    <Errors>0</Errors>
    <Checks>2880</Checks>
    </Objective>
  <Objective Type="Uptime">
    <Goal/>
    <Actual/>
    <Compliant/>
    <Errors>0</Errors>
    <Checks>0</Checks>
  </Objective>'''

print('Description: ' + getValue(s,'Description'))
print('From: ' + getValue(s,'From'))
print('Thru: ' + getValue(s,'Thru'))

打印

Description: Get Metadata
From: 2019-01-16 00:00
Thru: 2019-01-16 23:59

答案 4 :(得分:0)

在纯python中,应该是这样的:

xml = '<Report Type="Final Report" SiteName="Get Dataset" Name="Get Metadata" Description="Get Metadata" From="2019-01-16 00:00" Thru="2019-01-16 23:59" obj_device="479999" locations="69,31,"><Objective Type="Availability"><Goal>99.99</Goal><Actual>100.00</Actual><Compliant>Yes</Compliant><Errors>0</Errors><Checks>2880</Checks></Objective><Objective Type="Uptime"><Goal/><Actual/><Compliant/><Errors>0</Errors><Checks>0</Checks></Objective>'
report = xml.split('>')[0]
description = report.split("Description=\"")[1].split("\" From=\"")[0]
from_ = report.split("From=\"")[1].split("\" Thru=\"")[0]
thru = report.split("Thru=\"")[1].split("\" obj_device=\"")[0]