我有一个来自Nose的html格式的测试报告文件。我想在Python中从中提取文本的一些部分。我将在邮件部分的电子邮件中发送此邮件。
我有以下样本:
<table>
<tr>
<th>Class</th>
<th class="failed">Fail</th>
<th class="failed">Error</th>
<th>Skip</th>
<th>Success</th>
<th>Total</th>
</tr>
<tr>
<td>Regression_TestCase</td>
<td class="failed">1</td>
<td class="failed">9</td>
<td>0</td>
<td>219</td>
<td>229</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td class="failed">1</td>
<td class="failed">9</td>
<td>0</td>
<td>219</td>
<td>229</td>
</tr>
</table>
如果我在浏览器中打开文件,我想要的文本格式如下:这是我想从html文件中提取的文本。
Class Fail Error Skip Success Total
Regression_TestCase 1 9 0 219 229
在Python27中使用BeautifulSoup4我设法提取以下内容:
[<th>Class</th>, <th class="failed">Fail</th>, <th class="failed">Error</th>, <th>Skip</th>, <th>Success</th>, <th>Total</th>]
[<td>Regression_TestCase.RegressionProject_TestCase2.RegressionProject_TestCase2</td>, <td class="failed">1</td>, <td class="failed">9</td>, <td>0</td>, <td>219</td>, <td>229</td>, <td><strong>Total</strong></td>, <td class="failed">1</td>, <td class="failed">9</td>, <td>0</td>, <td>219</td>, <td>229</td>]
我的代码如下:
def extract_pass_summary_from_selenium_report():
html_report = open(r"C:\test_runners\selenium_regression_test_5_1_1\ClearCore 501 - Regression Test\TestReport\SeleniumTestReport.html",'r').read()
soup = BeautifulSoup(html_report, "html.parser")
print soup.find_all('th')
print soup.find_all('td')
如何提取文本并保持格式如下:?
Class Fail Error Skip Success Total
Regression_TestCase 1 9 0 219 229
谢谢Riaz
答案 0 :(得分:3)
您可以单独使用BeautifulSoup
解决此问题,但我会使用pandas
并使用pandas.read_html()
将HTML表解析为方便的数据框:
from StringIO import StringIO
import pandas as pd
data = """
<table>
<tr>
<th>Class</th>
<th class="failed">Fail</th>
<th class="failed">Error</th>
<th>Skip</th>
<th>Success</th>
<th>Total</th>
</tr>
<tr>
<td>Regression_TestCase</td>
<td class="failed">1</td>
<td class="failed">9</td>
<td>0</td>
<td>219</td>
<td>229</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td class="failed">1</td>
<td class="failed">9</td>
<td>0</td>
<td>219</td>
<td>229</td>
</tr>
</table>"""
df = pd.read_html(StringIO(data))
print(df)
打印:
[ 0 1 2 3 4 5
0 Class Fail Error Skip Success Total
1 Regression_TestCase 1 9 0 219 229
2 Total 1 9 0 219 229]
答案 1 :(得分:0)
添加功能
def html_to_text(html):
records = []
for i in range(len(html)):
html[i] = html[i].text
records.append(html[i])
return records
调用代码中的函数
ths = soup.find_all('th')
ths = html_to_text(ths)
print(ths)
tds = html_to_text(soup.find_all('td'))
print(tds)