Question

我尝试使用漂亮的汤来解析html，从网站中提取数据。我目前正在尝试从以下网页获取表格数据：

我想从表中获取数据。首先，我将页面保存为计算机上的html文件（这部分工作正常，我检查了我获得了所有信息）但是当我尝试使用以下代码解析时：

soup = BeautifulSoup(fh, 'html.parser')
table = soup.find_all('table') 
cols = table[0].find_all('tr')
cells = cols[1].find_all('td')`

我没有得到任何结果（特别是崩溃，说索引1没有元素）。知道它可能来自何处？

由于

Answer 1

好吧实际上这是html文件中的一个问题，在第一行中，html标签用th打开但用td关闭。我对HTML知之甚少，但用td代替了解决了这个问题。

<tr class="listeEtablenTete">
<th title="Rubrique IC">Rubri. IC</td>
<th title="Alin&eacute;a">Ali.&nbsp;</td>
<th title="Date d'autorisation">Date auto.</td>
<th >Etat d'activit&eacute;</td>
<th title="R&eacute;gime">R&eacute;g.</td>
<th >Activit&eacute;</td>
<th >Volume</td>
<th >Unit&eacute;</td>`

谢谢！

美丽的汤缺少一些html表标签

1 个答案: