使用BeautifulSoup查找测试用例和结果

时间:2014-09-23 10:37:58

标签: python beautifulsoup

我需要一个很好的方法来查找所有测试用例的名称以及html文件中每个测试用例的结果。我是BeautifulSoup的新手,需要一些好的建议。

首先我做了这个,使用BeautifulSoup读取数据并对其进行美化并将数据放入文件中:

from bs4 import BeautifulSoup
f = open('myfile','w')
soup = BeautifulSoup(open("C:\DEV\debugkod\data.html"))
fixedSoup = soup.prettify()
fixedSoup = fixedSoup.encode('utf-8')
f.write(fixedSoup)
f.close()

当我检查文件中的美化结果中的部分时,它将看起来像这样(该文件包含100个tc' s和结果):

<a name="1005">
  </a>
  <div class="Sequence">
   <div class="Header">
    <table class="Title">
     <tr>
      <td>
       IAA REQPROD 55 InvPwrDownMode - Shut down communication (Sequence)
      </td>
      <td class="ResultStateIcon">
       <img src="Resources/Passed.png"/>
      </td>
     </tr>
    </table>
    <table class="DynamicAttributes">
     <colgroup>
      <col width="20">
       <col width="30">
        <col width="20">
         <col width="30">
         </col>
        </col>
       </col>
      </col>
     </colgroup>
     <tr>
      <th>
       Start time:
      </th>
      <td>
       2014/09/23 09-24-31
      </td>
      <th>
       Stop time:
      </th>
      <td>
       2014/09/23 09-27-25
      </td>
     </tr>
     <tr>
      <th>
       Execution duration:
      </th>
      <td>
       173.461 sec.
      </td>
      *<th>
       Name:
      </th>
      <td>
       IAA REQPROD 55 InvPwrDownMode - Shut down communication
      </td>*
     </tr>
     <tr>
      <th>
       Library link:
      </th>
      <td>
      </td>
      <th>
       Creation date:
      </th>
      <td>
       2013/4/11, 8-55-57
      </td>
     </tr>
     <tr>
      <th>
       Modification date:
      </th>
      <td>
       2014/9/23, 9-27-25
      </td>
      <th>
       Author:
      </th>
      <td>
       cnnntd
      </td>
     </tr>
     <tr>
      <th>
       Hierarchy:
      </th>
      <td>
       IAA.  IAA REQPROD 55 InvPwrDownMode - Shut down communication
      </td>
      <td>
      </td>
      <td>
      </td>
     </tr>
    </table>
    <table class="StaticAttributes">
     <colgroup>
      <col width="20">
       <col width="80">
       </col>
      </col>
     </colgroup>
     <tr>
      <th>
       Description:
      </th>
      <td>
      </td>
     </tr>
     <tr>
      <th>
       *Result state:
      </th>
      <td>
       Passed
      </td>*
     </tr>
    </table>
   </div>
   <div class="BlockReport">
    <a name="1007">

在这个文件中,我现在想要找到关于&#34; Name&#34;的信息。和&#34;结果状态:&#34;。如果检查美化结果,我可以看到标签&#34;名称:&#34;和&#34;结果状态:&#34;。希望有可能使用它们来查找testCase名称和测试结果......所以打印输出应该是这样的:

 Name = IAA REQPROD 55 InvPwrDownMode - Shut down communication 
 Result = Passed
 etc

有没有人知道如何使用BeautifulSoup做到这一点?

1 个答案:

答案 0 :(得分:0)

使用html from your second Pastebin link,以下代码:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("beautifulsoup2.html"))


names = []
for table in soup.findAll('table', attrs={'class': 'Title'}):
    td = table.find('td')
    names.append(td.text.encode("ascii", "ignore").strip())

results = []
for table in soup.findAll(attrs={'class': 'StaticAttributes'}):
    tds = table.findAll('td')
    results.append(tds[1].text.strip())

for name, result in zip(names, results):
    print "Name = {}".format(name)
    print "Result = {}".format(result)
    print

给出了这个结果:

Name = IEM(Project)
Result = PassedFailedUndefinedError

Name = IEM REQPROD 132765 InvPwrDownMode - Shut down communication SN1(Sequence)
Result = Passed

Name = IEM REQPROD 86434 InvPwrDownMode - Time from shut down to sleep SN2(Sequence)
Result = PassedUndefined

Name = IEM Test(Sequence)
Result = Failed

Name = IEM REQPROD 86434 InvPwrDownMode - Time from shut down to sleep(Sequence)
Result = Error

我添加了encode("ascii", "ignore"),否则我会得到UnicodeDecodeError&#39}。有关这些字符可能最终出现在您的HTML中的信息,请参阅this answer