如何使用beautifulsoup选择特定的表并打印其数据

时间:2016-10-10 07:57:21

标签: python beautifulsoup

我有以下代码用于从名为ssllabs,com

的站点获取结果
from bs4 import BeautifulSoup
import requests
req  = requests.get("https://www.ssllabs.com/ssltest/analyze.html?d=drtest.test.sentinelcloud.com")
data = req.text
soup = BeautifulSoup(data)
report_tables=soup.find_all('table',class_='reportTable')
print report_tables

这会让我回到以下表格:

My data is in the table indicated with arrorow

现在我的数据在我指出的表格中。该表内部的结构类似于

<table class="reportTable">
   \n
   <thead>
      \n
      <tr>
         \n
         <td class="tableHead" colspan="3">Cipher Suites (SSL 3+ suites in server-preferred order; deprecated and SSL 2 suites at the end)</td>
         \n
      </tr>
      \n
   </thead>
   \n
   <tbody>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256\n                                        (<code>0xc02f</code>)\n                                                            \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384\n                                        (<code>0xc030</code>)\n                                                            \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_DHE_RSA_WITH_AES_128_GCM_SHA256\n                                        (<code>0x9e</code>)\n                                 \xa0\n                                    <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_DHE_RSA_WITH_AES_256_GCM_SHA384\n                                        (<code>0x9f</code>)\n                                 \xa0\n                                    <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256\n                                        (<code>0xc027</code>)\n                                                            \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA\n                                        (<code>0xc013</code>)\n                                                            \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384\n                                        (<code>0xc028</code>)\n                                                            \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA\n                                        (<code>0xc014</code>)\n                                                            \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_DHE_RSA_WITH_AES_128_CBC_SHA256\n                                        (<code>0x67</code>)\n                                 \xa0\n                                    <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_DHE_RSA_WITH_AES_128_CBC_SHA\n                                        (<code>0x33</code>)\n                                 \xa0\n                                    <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_DHE_RSA_WITH_AES_256_CBC_SHA256\n                                        (<code>0x6b</code>)\n                                 \xa0\n                                    <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_DHE_RSA_WITH_AES_256_CBC_SHA\n                                        (<code>0x39</code>)\n                                 \xa0\n                                    <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA\n                                        (<code>0xc012</code>)\n                                                            \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">112</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_RSA_WITH_AES_128_GCM_SHA256\n                                        (<code>0x9c</code>)\n                                                                \n                    \n                </td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_RSA_WITH_AES_256_GCM_SHA384\n                                        (<code>0x9d</code>)\n                                                                \n                    \n                </td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_RSA_WITH_AES_128_CBC_SHA256\n                                        (<code>0x3c</code>)\n                                                                \n                    \n                </td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_RSA_WITH_AES_256_CBC_SHA256\n                                        (<code>0x3d</code>)\n                                                                \n                    \n                </td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_RSA_WITH_AES_128_CBC_SHA\n                                        (<code>0x2f</code>)\n                                                                \n                    \n                </td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_RSA_WITH_AES_256_CBC_SHA\n                                        (<code>0x35</code>)\n                                                                \n                    \n                </td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA\n                                        (<code>0x88</code>)\n                                 \xa0\n                                    <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_RSA_WITH_CAMELLIA_256_CBC_SHA\n                                        (<code>0x84</code>)\n                                                                \n                    \n                </td>
         \n
         <td class="tableRight">256</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA\n                                        (<code>0x45</code>)\n                                 \xa0\n                                    <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_RSA_WITH_CAMELLIA_128_CBC_SHA\n                                        (<code>0x41</code>)\n                                                                \n                    \n                </td>
         \n
         <td class="tableRight">128</td>
         \n
      </tr>
      \n
      <tr class="tableRow">
         \n
         <td class="tableLeft">\n                                            TLS_RSA_WITH_3DES_EDE_CBC_SHA\n                                        (<code>0xa</code>)\n                                                                \n                    \n                </td>
         \n
         <td class="tableRight">112</td>
         \n
      </tr>
      \n
   </tbody>
   \n
</table>

我需要进入内部&#39; tbody&#39;并提取所有tableLeft值并将它们放在一个列表中。 我的问题:

1. How to select that particular reportTable at line 493 in picture.
2. How to extract the values (TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384.......................) and put in LIST

1 个答案:

答案 0 :(得分:2)

扩展@ furas&#39;略微评论,因为report_tables[4]假定它始终是第5个表:

req = requests.get("https://www.ssllabs.com/ssltest/analyze.html?d=drtest.test.sentinelcloud.com")
data = req.text
soup = BeautifulSoup(data)

for found_table in soup.find_all('table', class_='reportTable'):
    if 'Cipher Suites' in found_table.get_text():
        values = found_table.find_all('td', class_='tableLeft')
        entries = []
        for row in values:
            entries.append(row.get_text())
        print entries

检查密码套件&#39; (虽然如果需要你可以使用更完整的标题)应该帮助你更一致地获得正确的表格。

您可以简单地使用values作为输出,但使用get_text()可以帮助我们删除您可能不会需要的一些HTML。 entries将包含您需要的值,但您可能需要查看strip等函数以从结果中清除空格。

生成结果:

[u'\n                                            TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256\n                                        (0xc02f)\n                                                            \xa0  ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n                                            TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384\n                                        (0xc030)\n                                                            \xa0  ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n                                            TLS_DHE_RSA_WITH_AES_128_GCM_SHA256\n                                        (0x9e)\n                                 \xa0\n                                    \nDH 2048 bits \xa0 FS\n', u'\n                                            TLS_DHE_RSA_WITH_AES_256_GCM_SHA384\n                                        (0x9f)\n                                 \xa0\n                                    \nDH 2048 bits \xa0 FS\n', u'\n                                            TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256\n                                        (0xc027)\n                                                            \xa0  ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n                                            TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA\n                                        (0xc013)\n                                                            \xa0  ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n                                            TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384\n                                        (0xc028)\n                                                            \xa0  ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n                                            TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA\n                                        (0xc014)\n                                                            \xa0  ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n                                            TLS_DHE_RSA_WITH_AES_128_CBC_SHA256\n                                        (0x67)\n                                 \xa0\n                                    \nDH 2048 bits \xa0 FS\n', u'\n                                            TLS_DHE_RSA_WITH_AES_128_CBC_SHA\n                                        (0x33)\n                                 \xa0\n                                    \nDH 2048 bits \xa0 FS\n', u'\n                                            TLS_DHE_RSA_WITH_AES_256_CBC_SHA256\n                                        (0x6b)\n                                 \xa0\n                                    \nDH 2048 bits \xa0 FS\n', u'\n                                            TLS_DHE_RSA_WITH_AES_256_CBC_SHA\n                                        (0x39)\n                                 \xa0\n                                    \nDH 2048 bits \xa0 FS\n', u'\n                                            TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA\n                                        (0xc012)\n                                                            \xa0  ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n                                            TLS_RSA_WITH_AES_128_GCM_SHA256\n                                        (0x9c)\n                                                                \n                    \n                ', u'\n                                            TLS_RSA_WITH_AES_256_GCM_SHA384\n                                        (0x9d)\n                                                                \n                    \n                ', u'\n                                            TLS_RSA_WITH_AES_128_CBC_SHA256\n                                        (0x3c)\n                                                                \n                    \n                ', u'\n                                            TLS_RSA_WITH_AES_256_CBC_SHA256\n                                        (0x3d)\n                                                                \n                    \n                ', u'\n                                            TLS_RSA_WITH_AES_128_CBC_SHA\n                                        (0x2f)\n                                                                \n                    \n                ', u'\n                                            TLS_RSA_WITH_AES_256_CBC_SHA\n                                        (0x35)\n                                                                \n                    \n                ', u'\n                                            TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA\n                                        (0x88)\n                                 \xa0\n                                    \nDH 2048 bits \xa0 FS\n', u'\n                                            TLS_RSA_WITH_CAMELLIA_256_CBC_SHA\n                                        (0x84)\n                                                                \n                    \n                ', u'\n                                            TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA\n                                        (0x45)\n                                 \xa0\n                                    \nDH 2048 bits \xa0 FS\n', u'\n                                            TLS_RSA_WITH_CAMELLIA_128_CBC_SHA\n                                        (0x41)\n                                                                \n                    \n                ', u'\n                                            TLS_RSA_WITH_3DES_EDE_CBC_SHA\n                                        (0xa)\n                                                                \n                    \n                ']

编辑:根据@ PadraicCunningham的评论扩展此内容,我们可以删除空白并返回第一个值,如下所示:

for found_table in soup.find_all('table', class_='reportTable'):
    if 'Cipher Suites' in found_table.get_text():
        vals = [td.text.split()[0] for td in found_table.select("td.tableLeft")]
        print vals
        break