我有一个来自Nose的html格式的测试报告文件。 我想在Python中从中提取文本的一些部分。我将在邮件部分的电子邮件中发送此邮件。
我有以下样本:
<!DOCTYPE html>
<html>
<head>
<title>Unit Test Report</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<style>
body {
font-family: Calibri, "Trebuchet MS", sans-serif;
}
* {
word-break: break-all;
}
table, td, th, .dataid {
border: 1px solid #aaa;
border-collapse: collapse;
background: #fff;
}
section {
background: rgba(0, 0, 0, 0.05);
margin: 2ex;
padding: 1ex;
border: 1px solid #999;
border-radius: 5px;
}
h1 {
font-size: 130%;
}
h2 {
font-size: 120%;
}
h3 {
font-size: 100%;
}
h4 {
font-size: 85%;
}
h1, h2, h3, h4, a[href] {
cursor: pointer;
color: #0074d9;
text-decoration: none;
}
h3 strong, a.failed {
color: #ff4136;
}
.failed {
color: #ff4136;
}
a.success {
color: #3d9970;
}
pre {
font-family: 'Consolas', 'Deja Vu Sans Mono',
'Bitstream Vera Sans Mono', 'Monaco',
'Courier New', monospace;
}
.test-details,
.traceback {
display: none;
}
section:target .test-details {
display: block;
}
</style>
</head>
<body>
<h1>Overview</h1>
<section>
<table>
<tr>
<th>Class</th>
<th class="failed">Fail</th>
<th class="failed">Error</th>
<th>Skip</th>
<th>Success</th>
<th>Total</th>
</tr>
<tr>
<td>Regression_TestCase.RegressionProject_TestCase2.RegressionProject_TestCase2</td>
<td class="failed">1</td>
<td class="failed">9</td>
<td>0</td>
<td>219</td>
<td>229</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td class="failed">1</td>
<td class="failed">9</td>
<td>0</td>
<td>219</td>
<td>229</td>
</tr>
</table>
</section>
<h1>Failure details</h1>
<section>
<h2>Regression_TestCase.RegressionProject_TestCase2.RegressionProject_TestCase2 (1 failures, 9 errors)</h2>
<div>
<section id="Regression_TestCase.RegressionProject_TestCase2.RegressionProject_TestCase2:test_00010_import_user_invalid_credentials">
<h3>test_00010_import_user_invalid_credentials: <strong>selenium.common.exceptions.NoSuchElementException</strong></h3>
<div class="test-details">
<h4>Traceback</h4>
<pre class="traceback">Traceback (most recent call last):
File "C:\Python27\lib\unittest\case.py", line 329, in run
testMethod()
File "C:\test_runners\selenium_regression_test_5_1_1\ClearCore - Regression Test\Regression_TestCase\RegressionProject_TestCase2.py", line 221, in test_00010_import_user_invalid_credentials
Globals.login_password_invalid)
File "C:\test_runners\selenium_regression_test_5_1_1\ClearCore - Regression Test\Pages\security.py", line 51, in enter_invalid_userid_and_password
self.enter_user_id(userid)
File "C:\test_runners\selenium_regression_test_5_1_1\ClearCore - Regression Test\Pages\security.py", line 32, in enter_user_id
user_id_element = self.get_element(*MainPageLocators.security_user_id_textfield_xpath)
File "C:\test_runners\selenium_regression_test_5_1_1\ClearCore - Regression Test\Pages\base.py", line 40, in get_element
element = self.driver.find_element(by=how, value=what)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 712, in find_element
{'using': by, 'value': value})['value']
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 201, in execute
self.error_handler.check_response(response)
File "C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 194, in check_response
raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: Message: Unable to find element with xpath == //span[@class="gwt-InlineLabel marginbelow myinlineblock" and contains(text(), "User ID (including domain)")]/following-sibling::input
-------------------- >> begin captured stdout << ---------------------
*** Test import_invalid_user_credentials ***
05_12_1616_49_42
//span[@class="gwt-InlineLabel marginbelow myinlineblock" and contains(text(), "User ID (including domain)")]/following-sibling::input
Element not found
Message: Unable to find element with xpath == //span[@class="gwt-InlineLabel marginbelow myinlineblock" and contains(text(), "User ID (including domain)")]/following-sibling::input
05_12_1616_51_54
--------------------- >> end captured stdout << ----------------------
----
# There is more html below. I have not included everything. It will be too long otherwise.
如果我在浏览器中打开文件,格式如下: 这是我想从html文件中提取的文本。
Class Fail Error Skip Success Total
Regression_TestCase 1 9 0 219 229
我该怎么办?以表格格式保存它会很好。 谢谢,Riaz
答案 0 :(得分:1)
您的示例html代码包含未打开的标签和没有打开标签的结束标签。我假设您只显示一个示例,并且您提取的文件格式如下:
<body>
<h1>Overview</h1>
<section>
<table>
<tr>
<th>Class</th>
<th class="failed">Fail</th>
<th class="failed">Error</th>
<th>Skip</th>
<th>Success</th>
<th>Total</th>
</tr>
<tr>
<td>Regression_TestCase</td>
<td class="failed">1</td>
<td class="failed">9</td>
<td>0</td>
<td>219</td>
<td>229</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td class="failed">1</td>
<td class="failed">9</td>
<td>0</td>
<td>219</td>
<td>229</td>
</tr>
</table>
</section>
</body>
您可以使用Etree模块将代码解析为xml。 编辑:将用于查找表格的方法更改为使用xpath并使其成为&#34; Total&#34;列将不会被打印。
编辑2:我现在使用正则表达式来提取代码中的所有表。小心使用它,因为它是一个非常脆弱的解决方案。如果有一个没有关闭表标记的打开表标记,那么它将在打开表标记之后提取所有文本并崩溃,因为生成的字符串将不是格式良好的xml。
import csv
import re
import xml.etree.ElementTree as ET
# Extract well formed tables
start = re.compile(r"<table>", re.IGNORECASE)
end = re.compile(r"</table>", re.IGNORECASE)
html_code = ""
table = False
with open('sample2.xml') as xmlfile:
for line in xmlfile:
if not table:
table = start.search(line)
if table:
html_code += line
else:
if end.search(line):
html_code += line[0:end.search(line).end()]
table = False
else:
html_code += line
table = not end.search(line)
print html_code
# Parse html code into Etree Element object
root = ET.fromstring(html_code)
elements = root.findall(".//tr")
print elements
row = []
with open('output.csv', 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=',', quotechar='"')
for tablerow in elements:
# Only write result to file if there is text inside the first column
if list(tablerow)[0].text:
for col in list(tablerow):
row.append(col.text)
csvwriter.writerow(row)
print row
row = []
如果你打开&#34; output.csv&#34;使用excel,你将有你的表。如果您使用的是此方法,请注意文档中的安全警告(zezollo评论中的链接)。
或者,您可以使用正则表达式,但我太累了,无法编写其他解决方案。也许明天,或其他人可能会提供替代解决方案。