我有以下代码用于从名为ssllabs,com
的站点获取结果from bs4 import BeautifulSoup
import requests
req = requests.get("https://www.ssllabs.com/ssltest/analyze.html?d=drtest.test.sentinelcloud.com")
data = req.text
soup = BeautifulSoup(data)
report_tables=soup.find_all('table',class_='reportTable')
print report_tables
这会让我回到以下表格:
现在我的数据在我指出的表格中。该表内部的结构类似于
<table class="reportTable">
\n
<thead>
\n
<tr>
\n
<td class="tableHead" colspan="3">Cipher Suites (SSL 3+ suites in server-preferred order; deprecated and SSL 2 suites at the end)</td>
\n
</tr>
\n
</thead>
\n
<tbody>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256\n (<code>0xc02f</code>)\n \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384\n (<code>0xc030</code>)\n \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_DHE_RSA_WITH_AES_128_GCM_SHA256\n (<code>0x9e</code>)\n \xa0\n <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_DHE_RSA_WITH_AES_256_GCM_SHA384\n (<code>0x9f</code>)\n \xa0\n <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256\n (<code>0xc027</code>)\n \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA\n (<code>0xc013</code>)\n \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384\n (<code>0xc028</code>)\n \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA\n (<code>0xc014</code>)\n \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_DHE_RSA_WITH_AES_128_CBC_SHA256\n (<code>0x67</code>)\n \xa0\n <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_DHE_RSA_WITH_AES_128_CBC_SHA\n (<code>0x33</code>)\n \xa0\n <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_DHE_RSA_WITH_AES_256_CBC_SHA256\n (<code>0x6b</code>)\n \xa0\n <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_DHE_RSA_WITH_AES_256_CBC_SHA\n (<code>0x39</code>)\n \xa0\n <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA\n (<code>0xc012</code>)\n \xa0 <span class="greySmall"> ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS</span>\n</td>
\n
<td class="tableRight">112</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_RSA_WITH_AES_128_GCM_SHA256\n (<code>0x9c</code>)\n \n \n </td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_RSA_WITH_AES_256_GCM_SHA384\n (<code>0x9d</code>)\n \n \n </td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_RSA_WITH_AES_128_CBC_SHA256\n (<code>0x3c</code>)\n \n \n </td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_RSA_WITH_AES_256_CBC_SHA256\n (<code>0x3d</code>)\n \n \n </td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_RSA_WITH_AES_128_CBC_SHA\n (<code>0x2f</code>)\n \n \n </td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_RSA_WITH_AES_256_CBC_SHA\n (<code>0x35</code>)\n \n \n </td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA\n (<code>0x88</code>)\n \xa0\n <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_RSA_WITH_CAMELLIA_256_CBC_SHA\n (<code>0x84</code>)\n \n \n </td>
\n
<td class="tableRight">256</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA\n (<code>0x45</code>)\n \xa0\n <span class="greySmall">\n<span title="p: 256, g: 1, Ys: 256">DH 2048 bits</span> \xa0 FS</span>\n</td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_RSA_WITH_CAMELLIA_128_CBC_SHA\n (<code>0x41</code>)\n \n \n </td>
\n
<td class="tableRight">128</td>
\n
</tr>
\n
<tr class="tableRow">
\n
<td class="tableLeft">\n TLS_RSA_WITH_3DES_EDE_CBC_SHA\n (<code>0xa</code>)\n \n \n </td>
\n
<td class="tableRight">112</td>
\n
</tr>
\n
</tbody>
\n
</table>
我需要进入内部&#39; tbody&#39;并提取所有tableLeft值并将它们放在一个列表中。 我的问题:
1. How to select that particular reportTable at line 493 in picture.
2. How to extract the values (TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384.......................) and put in LIST
答案 0 :(得分:2)
扩展@ furas&#39;略微评论,因为report_tables[4]
假定它始终是第5个表:
req = requests.get("https://www.ssllabs.com/ssltest/analyze.html?d=drtest.test.sentinelcloud.com")
data = req.text
soup = BeautifulSoup(data)
for found_table in soup.find_all('table', class_='reportTable'):
if 'Cipher Suites' in found_table.get_text():
values = found_table.find_all('td', class_='tableLeft')
entries = []
for row in values:
entries.append(row.get_text())
print entries
检查密码套件&#39; (虽然如果需要你可以使用更完整的标题)应该帮助你更一致地获得正确的表格。
您可以简单地使用values
作为输出,但使用get_text()
可以帮助我们删除您可能不会需要的一些HTML。 entries
将包含您需要的值,但您可能需要查看strip
等函数以从结果中清除空格。
生成结果:
[u'\n TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256\n (0xc02f)\n \xa0 ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384\n (0xc030)\n \xa0 ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n TLS_DHE_RSA_WITH_AES_128_GCM_SHA256\n (0x9e)\n \xa0\n \nDH 2048 bits \xa0 FS\n', u'\n TLS_DHE_RSA_WITH_AES_256_GCM_SHA384\n (0x9f)\n \xa0\n \nDH 2048 bits \xa0 FS\n', u'\n TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256\n (0xc027)\n \xa0 ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA\n (0xc013)\n \xa0 ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384\n (0xc028)\n \xa0 ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA\n (0xc014)\n \xa0 ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n TLS_DHE_RSA_WITH_AES_128_CBC_SHA256\n (0x67)\n \xa0\n \nDH 2048 bits \xa0 FS\n', u'\n TLS_DHE_RSA_WITH_AES_128_CBC_SHA\n (0x33)\n \xa0\n \nDH 2048 bits \xa0 FS\n', u'\n TLS_DHE_RSA_WITH_AES_256_CBC_SHA256\n (0x6b)\n \xa0\n \nDH 2048 bits \xa0 FS\n', u'\n TLS_DHE_RSA_WITH_AES_256_CBC_SHA\n (0x39)\n \xa0\n \nDH 2048 bits \xa0 FS\n', u'\n TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA\n (0xc012)\n \xa0 ECDH secp256r1 (eq. 3072 bits RSA) \xa0 FS\n', u'\n TLS_RSA_WITH_AES_128_GCM_SHA256\n (0x9c)\n \n \n ', u'\n TLS_RSA_WITH_AES_256_GCM_SHA384\n (0x9d)\n \n \n ', u'\n TLS_RSA_WITH_AES_128_CBC_SHA256\n (0x3c)\n \n \n ', u'\n TLS_RSA_WITH_AES_256_CBC_SHA256\n (0x3d)\n \n \n ', u'\n TLS_RSA_WITH_AES_128_CBC_SHA\n (0x2f)\n \n \n ', u'\n TLS_RSA_WITH_AES_256_CBC_SHA\n (0x35)\n \n \n ', u'\n TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA\n (0x88)\n \xa0\n \nDH 2048 bits \xa0 FS\n', u'\n TLS_RSA_WITH_CAMELLIA_256_CBC_SHA\n (0x84)\n \n \n ', u'\n TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA\n (0x45)\n \xa0\n \nDH 2048 bits \xa0 FS\n', u'\n TLS_RSA_WITH_CAMELLIA_128_CBC_SHA\n (0x41)\n \n \n ', u'\n TLS_RSA_WITH_3DES_EDE_CBC_SHA\n (0xa)\n \n \n ']
编辑:根据@ PadraicCunningham的评论扩展此内容,我们可以删除空白并返回第一个值,如下所示:
for found_table in soup.find_all('table', class_='reportTable'):
if 'Cipher Suites' in found_table.get_text():
vals = [td.text.split()[0] for td in found_table.select("td.tableLeft")]
print vals
break