我正在使用正则表达式分析日志文件。 记录示例:
<teststep timestamp="12040.310594" level="0" type="user" ident="1.2" result="pass">Signal STATUS_GET_VALUE response time Ok,\nSignal response time: 0.000000 [ms] \nSignal response time limit set: 100.000000 [ms]</teststep>
我需要提取时间戳和信号响应时间。
我解决这个问题的方法:
with open('report.xml') as f:
for line in f:
if 'Signal response time: ' in line:
timeStampL = re.findall('timestamp="\d*.\d*"', line)
responseTimeL = re.findall('Signal response time: \d*.\d*',
line, re.IGNORECASE)
timeStamp = float(re.findall('\d+.\d+', timeStampL[0])[0])
responseTime = float(re.findall('\d+.\d+', responseTimeL[0])[0])
我确信这不是获取此数据的最短和最佳方式。 你想建议我一个更好的方法吗?
答案 0 :(得分:2)
我们可以使用BeautifulSoup
提取属性值和元素文本,如下所示:
由于文字值Signal STATUS_GET_VALUE response time Ok,\nSignal response time: 0.000000 [ms] \nSignal response time limit set: 100.000000 [ms]
附带\n
个分隔符,因此您可以将它们用于split()
您的数据,并仅获取0.000000 [ms]
。
<强>代码:强>
from bs4 import BeautifulSoup
html_code = '<teststep timestamp="12040.310594" level="0" type="user" ident="1.2" result="pass">Signal STATUS_GET_VALUE response time Ok,\nSignal response time: 0.000000 [ms] \nSignal response time limit set: 100.000000 [ms]</teststep>'
soup = BeautifulSoup(html_code, "html.parser")
for test in soup.find_all('teststep'):
print(test.get('timestamp'))
print(test.text.split("\n")[1].split(":")[1].strip())
<强>输出:强>
12040.310594
0.000000 [ms]
P.s。:您可以通过更改以下内容删除[ms]
处的0.000000 [ms]
:
test.text.split("\n")[1].split(":")[1].strip()
到此:
test.text.split("\n")[1].split(":")[1].strip().replace(" [ms]", "")
答案 1 :(得分:1)
获得所需结果的另一种方法是使用像BeautifulSoup
这样的XML / HTML解析器来定位元素,获取timestamp
属性(在BeautifulSoup
中你可以对待在读取属性时将元素作为字典)并使用正则表达式提取“信号响应时间”:
In [1]: import re
In [2]: from bs4 import BeautifulSoup
In [3]: data = """<teststep timestamp="12040.310594" level="0" type="user" ident="1.2" result="pas
...: s">Signal STATUS_GET_VALUE response time Ok,\nSignal response time: 0.000000 [ms] \nSignal
...: response time limit set: 100.000000 [ms]</teststep>"""
In [4]: soup = BeautifulSoup(data, "html.parser")
In [5]: pattern = re.compile(r"Signal response time: ([0-9.]+)")
In [6]: elm = soup.find("teststep", text=pattern)
In [7]: print(elm["timestamp"], pattern.search(elm.get_text()).group(1))
12040.310594 0.000000