从和<a href=""> in the same

时间:2018-03-06 18:21:32

标签: python web-scraping beautifulsoup

I'm using Beautiful Soup to scrape some data and I'm running into the following problem:

for tr in soup.select("tr[class^='rg']"):
    row = [ td.text.strip() for td in tr('td')[1:-1] ]
    if row:
        print(','.join(row))

I've selected the classes I want data from, but a couple of the fields in each row are a's nested under the td's and in one case there's tooltip text I'd like to grab. Ideally I'd use the same loop here to extract the other text as well.

<tr class="rgRow">
    <td><a href="webpage.aspx?>Text</a></td>
    <td>
        <a href="#" tooltip="Tooltip text"><img border="0" src="images/note.png"/></a>
    </td>
    <td>Some Text</td>
    <td>Some More Text</td>
</tr>

1 个答案:

答案 0 :(得分:0)

您可以尝试替换

row = [ td.text.strip() for td in tr('td')[1:-1] ]

row = [td.text.strip() or td.a['tooltip'] for td in tr('td')[1:-1]]

td子节点获取@tooltip节点或a的文本

相关问题