我正在尝试解析Python doctest模块中的输出并将其存储在HTML文件中。
我的输出类似于:
**********************************************************************
File "example.py", line 16, in __main__.factorial
Failed example:
[factorial(n) for n in range(6)]
Expected:
[0, 1, 2, 6, 24, 120]
Got:
[1, 1, 2, 6, 24, 120]
**********************************************************************
File "example.py", line 20, in __main__.factorial
Failed example:
factorial(30)
Expected:
25252859812191058636308480000000L
Got:
265252859812191058636308480000000L
**********************************************************************
1 items had failures:
2 of 8 in __main__.factorial
***Test Failed*** 2 failures.
每个故障前面都有一行星号,它们将每个测试失败分开。
我想要做的是删除失败的文件名和方法,以及预期和实际结果。然后我想使用它创建一个HTML文档(或将其存储在文本文件中,然后进行第二轮解析)。
如何仅使用Python或某些UNIX shell实用程序组合来完成此操作?
编辑:我制定了以下shell脚本,它匹配每个块我想要的方式,但我不确定如何将每个sed匹配重定向到它自己的文件。
python example.py | sed -n '/.*/,/^\**$/p' > `mktemp error.XXX`
答案 0 :(得分:4)
你可以编写一个Python程序来分开这个,但也许更好的办法是调查doctest来输出你想要的报告。来自doctest.DocTestRunner的文档:
... the display output
can be also customized by subclassing DocTestRunner, and
overriding the methods `report_start`, `report_success`,
`report_unexpected_exception`, and `report_failure`.
答案 1 :(得分:1)
这是一个快速而又脏的脚本,它使用相关信息将输出解析为元组:
import sys
import re
stars_re = re.compile('^[*]+$', re.MULTILINE)
file_line_re = re.compile(r'^File "(.*?)", line (\d*), in (.*)$')
doctest_output = sys.stdin.read()
chunks = stars_re.split(doctest_output)[1:-1]
for chunk in chunks:
chunk_lines = chunk.strip().splitlines()
m = file_line_re.match(chunk_lines[0])
file, line, module = m.groups()
failed_example = chunk_lines[2].strip()
expected = chunk_lines[4].strip()
got = chunk_lines[6].strip()
print (file, line, module, failed_example, expected, got)
答案 2 :(得分:1)
我在pyparsing中编写了一个快速解析器来执行此操作。
from pyparsing import *
str = """
**********************************************************************
File "example.py", line 16, in __main__.factorial
Failed example:
[factorial(n) for n in range(6)]
Expected:
[0, 1, 2, 6, 24, 120]
Got:
[1, 1, 2, 6, 24, 120]
**********************************************************************
File "example.py", line 20, in __main__.factorial
Failed example:
factorial(30)
Expected:
25252859812191058636308480000000L
Got:
265252859812191058636308480000000L
**********************************************************************
"""
quote = Literal('"').suppress()
comma = Literal(',').suppress()
in_ = Keyword('in').suppress()
block = OneOrMore("**").suppress() + \
Keyword("File").suppress() + \
quote + Word(alphanums + ".") + quote + \
comma + Keyword("line").suppress() + Word(nums) + comma + \
in_ + Word(alphanums + "._") + \
LineStart() + restOfLine.suppress() + \
LineStart() + restOfLine + \
LineStart() + restOfLine.suppress() + \
LineStart() + restOfLine + \
LineStart() + restOfLine.suppress() + \
LineStart() + restOfLine
all = OneOrMore(Group(block))
result = all.parseString(str)
for section in result:
print section
给出
['example.py', '16', '__main__.factorial', ' [factorial(n) for n in range(6)]', ' [0, 1, 2, 6, 24, 120]', ' [1, 1, 2, 6, 24, 120]']
['example.py', '20', '__main__.factorial', ' factorial(30)', ' 25252859812191058636308480000000L', ' 265252859812191058636308480000000L']
答案 3 :(得分:0)
这可能是我编写过的最不优雅的python脚本之一,但它应该有框架来执行您想要的操作而无需借助UNIX实用程序和单独的脚本来创建html。它没有经过测试,但只需要进行微调即可。
import os
import sys
#create a list of all files in directory
dirList = os.listdir('')
#Ignore anything that isn't a .txt file.
#
#Read in text, then split it into a list.
for thisFile in dirList:
if thisFile.endswith(".txt"):
infile = open(thisFile,'r')
rawText = infile.read()
yourList = rawText.split('\n')
#Strings
compiledText = ''
htmlText = ''
for i in yourList:
#clunky way of seeing whether or not current line
#should be included in compiledText
if i.startswith("*****"):
compiledText += "\n\n--- New Report ---\n"
if i.startswith("File"):
compiledText += i + '\n'
if i.startswith("Fail"):
compiledText += i + '\n'
if i.startswith("Expe"):
compiledText += i + '\n'
if i.startswith("Got"):
compiledText += i + '\n'
if i.startswith(" "):
compiledText += i + '\n'
#insert your HTML template below
htmlText = '<html>...\n <body> \n '+htmlText+'</body>... </html>'
#write out to file
outfile = open('processed/'+thisFile+'.html','w')
outfile.write(htmlText)
outfile.close()