BeautifulSoup从表中的某些列中提取数据我得到的数据太多了

时间:2016-08-13 18:11:41

标签: python-2.7 beautifulsoup

我正在尝试从我的Selenium Test Report html文件中提取一些数据 我从行和列表中获取了太多数据。 我想要提取的数据是所有具有类值“testcase”的列,下面有一个类,其值为“popup_link”,文本值将显示为Pass或Fail。 E.g。

<td class='none'><div class='testcase'>test_000001_login_valid_user</div></td>
<a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.1')" >
    pass</a>

我希望文字“test_000001_login_valid_user”和文字“传递”

我的报告中有很多测试用例,所以我想迭代这些行并获取测试用例名称和通过或失败文本。

我的HTML代码段是:

    <table id='result_table'>
<colgroup>
<col align='left' />
<col align='right' />
<col align='right' />
<col align='right' />
<col align='right' />
<col align='right' />
</colgroup>
<tr id='header_row'>
    <td>Test Group/Test case</td>
    <td>Count</td>
    <td>Pass</td>
    <td>Fail</td>
    <td>Error</td>
    <td>View</td>
</tr>

<tr class='passClass'>
    <td>Regression_TestCase.RegressionProjectEdit_TestCase.RegressionProject_TestCase_Project_Edit</td>
    <td>75</td>
    <td>75</td>
    <td>0</td>
    <td>0</td>
    <td><a href="javascript:showClassDetail('c1',75)">Detail</a></td>
</tr>

<tr id='pt1.1' class='hiddenRow'>
    <td class='none'><div class='testcase'>test_000001_login_valid_user</div></td>
    <td colspan='5' align='center'>

    <!--css div popup start-->
    <a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.1')" >
        pass</a>

    <div id='div_pt1.1' class="popup_window">
        <div style='text-align: right; color:red;cursor:pointer'>
        <a onfocus='this.blur();' onclick="document.getElementById('div_pt1.1').style.display = 'none' " >
           [x]</a>
        </div>
        <pre>

pt1.1: *** test_login_valid_user ***
test login with a valid user - Passed


        </pre>
    </div>
    <!--css div popup end-->

    </td>
</tr>

<tr id='pt1.2' class='hiddenRow'>
    <td class='none'><div class='testcase'>test_000002_select_a_project</div></td>
    <td colspan='5' align='center'>

    <!--css div popup start-->
    <a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.2')" >
        pass</a>

    <div id='div_pt1.2' class="popup_window">
        <div style='text-align: right; color:red;cursor:pointer'>
        <a onfocus='this.blur();' onclick="document.getElementById('div_pt1.2').style.display = 'none' " >
           [x]</a>
        </div>
        <pre>

pt1.2: *** test_login_valid_user ***
test login with a valid user - Passed
*** test_select_a_project ***
08_12_1612_08_03
Selenium_Regression_Edit_Project_Test


        </pre>
    </div>
    <!--css div popup end-->

    </td>
</tr>

<tr id='pt1.3' class='hiddenRow'>
    <td class='none'><div class='testcase'>test_000003_verify_Lademo_CRM_DataPreview_is_present</div></td>
    <td colspan='5' align='center'>

    <!--css div popup start-->
    <a class="popup_link" onfocus='this.blur();' href="javascript:showTestDetail('div_pt1.3')" >
        pass</a>

    <div id='div_pt1.3' class="popup_window">
        <div style='text-align: right; color:red;cursor:pointer'>
        <a onfocus='this.blur();' onclick="document.getElementById('div_pt1.3').style.display = 'none' " >
           [x]</a>
        </div>
        <pre>

pt1.3: *** test_login_valid_user ***
test login with a valid user - Passed
*** test_select_a_project ***
08_12_1612_08_03
Selenium_Regression_Edit_Project_Test
*** Test verify_Lademo_CRM_DataPreview_is_present ***
aSelenium_LADEMO_CRM_DONOTCHANGE
File
498


        </pre>
    </div>
    <!--css div popup end-->

    </td>
</tr>

我的代码是:

from bs4 import BeautifulSoup

table = soup.select_one("#result_table")

for row in table.select("tr.hiddenRow"):
        print(" ".join([td.text for td in row.find_all("td")]))

我怎样才能实现这一目标?

谢谢Riaz

1 个答案:

答案 0 :(得分:1)

检查每一行,如果两者都存在则提取文本:

preHandle

给你:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)

for row in soup.select("#result_table tr"):
    div, a = row.select_one("div.testcase"),  row.select_one("a.popup_link")
    if div and a:
        print(div.text.strip(), a.text.strip())

当然,如果他们总是在一起,我们可以简化为:

(u'test_000001_login_valid_user', u'pass')
(u'test_000002_select_a_project', u'pass')
(u'test_000003_verify_Lademo_CRM_DataPreview_is_present', u'pass')