我有一个html表结构,看起来像:
<table>
<tbody>
<tr>
<td>
<ul>
</ul
</td>
<td>
<table>
<tbody>
<tr></tr>
<tr></tr>
<tr></tr>
</tbody>
</table>
<table> -- (table structure I am interested in)
<tbody>
<tr>
<td class="dte"></td>
<td class="id"></td>
<td class="desc"></td>
</tr>
<tr>
<td class="dte"></td>
<td class="id"></td>
<td class="desc"></td>
</tr>
<tr>
<td class="dte"></td>
<td class="id"></td>
<td class="desc"></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
使用python / BeautifulSoup,我设法将输出打印到屏幕上,如: -
[b'16 March', b'987654', b'Something happens on this date']
[b'23 March', b'321987', b'Something happens on this date']
[b'26 March', b'123456', b'Something happens on this date']
使用以下代码(我已经从本网站上的各个帖子一起入侵): -
for mytable in soup.find('body').find_all('table'):
#print (len(mytable))
for trs in mytable.find_all('tr'):
tds = trs.find_all('td', class_='dte id desc'.split())
if tds: # checks if 'tds' has value. if YES then block is executed
row = [elem.text.strip().encode('utf-8') for elem in tds]
print (row)
else:
continue # 'row' item is empty, proceed to next loop
2个问题: