我需要一个很好的方法来查找所有测试用例的名称以及html文件中每个测试用例的结果。我是BeautifulSoup的新手,需要一些好的建议。
首先我做了这个,使用BeautifulSoup读取数据并对其进行美化并将数据放入文件中:
from bs4 import BeautifulSoup
f = open('myfile','w')
soup = BeautifulSoup(open("C:\DEV\debugkod\data.html"))
fixedSoup = soup.prettify()
fixedSoup = fixedSoup.encode('utf-8')
f.write(fixedSoup)
f.close()
当我检查文件中的美化结果中的部分时,它将看起来像这样(该文件包含100个tc' s和结果):
<a name="1005">
</a>
<div class="Sequence">
<div class="Header">
<table class="Title">
<tr>
<td>
IAA REQPROD 55 InvPwrDownMode - Shut down communication (Sequence)
</td>
<td class="ResultStateIcon">
<img src="Resources/Passed.png"/>
</td>
</tr>
</table>
<table class="DynamicAttributes">
<colgroup>
<col width="20">
<col width="30">
<col width="20">
<col width="30">
</col>
</col>
</col>
</col>
</colgroup>
<tr>
<th>
Start time:
</th>
<td>
2014/09/23 09-24-31
</td>
<th>
Stop time:
</th>
<td>
2014/09/23 09-27-25
</td>
</tr>
<tr>
<th>
Execution duration:
</th>
<td>
173.461 sec.
</td>
*<th>
Name:
</th>
<td>
IAA REQPROD 55 InvPwrDownMode - Shut down communication
</td>*
</tr>
<tr>
<th>
Library link:
</th>
<td>
</td>
<th>
Creation date:
</th>
<td>
2013/4/11, 8-55-57
</td>
</tr>
<tr>
<th>
Modification date:
</th>
<td>
2014/9/23, 9-27-25
</td>
<th>
Author:
</th>
<td>
cnnntd
</td>
</tr>
<tr>
<th>
Hierarchy:
</th>
<td>
IAA. IAA REQPROD 55 InvPwrDownMode - Shut down communication
</td>
<td>
</td>
<td>
</td>
</tr>
</table>
<table class="StaticAttributes">
<colgroup>
<col width="20">
<col width="80">
</col>
</col>
</colgroup>
<tr>
<th>
Description:
</th>
<td>
</td>
</tr>
<tr>
<th>
*Result state:
</th>
<td>
Passed
</td>*
</tr>
</table>
</div>
<div class="BlockReport">
<a name="1007">
在这个文件中,我现在想要找到关于&#34; Name&#34;的信息。和&#34;结果状态:&#34;。如果检查美化结果,我可以看到标签&#34;名称:&#34;和&#34;结果状态:&#34;。希望有可能使用它们来查找testCase名称和测试结果......所以打印输出应该是这样的:
Name = IAA REQPROD 55 InvPwrDownMode - Shut down communication
Result = Passed
etc
有没有人知道如何使用BeautifulSoup做到这一点?
答案 0 :(得分:0)
使用html from your second Pastebin link,以下代码:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("beautifulsoup2.html"))
names = []
for table in soup.findAll('table', attrs={'class': 'Title'}):
td = table.find('td')
names.append(td.text.encode("ascii", "ignore").strip())
results = []
for table in soup.findAll(attrs={'class': 'StaticAttributes'}):
tds = table.findAll('td')
results.append(tds[1].text.strip())
for name, result in zip(names, results):
print "Name = {}".format(name)
print "Result = {}".format(result)
print
给出了这个结果:
Name = IEM(Project)
Result = PassedFailedUndefinedError
Name = IEM REQPROD 132765 InvPwrDownMode - Shut down communication SN1(Sequence)
Result = Passed
Name = IEM REQPROD 86434 InvPwrDownMode - Time from shut down to sleep SN2(Sequence)
Result = PassedUndefined
Name = IEM Test(Sequence)
Result = Failed
Name = IEM REQPROD 86434 InvPwrDownMode - Time from shut down to sleep(Sequence)
Result = Error
我添加了encode("ascii", "ignore")
,否则我会得到UnicodeDecodeError
&#39}。有关这些字符可能最终出现在您的HTML中的信息,请参阅this answer。