我试图解析以下HTML
:
<div class="content">
<h3>
Kontaktuppgifter</h3>
<table>
<tr>
<th>
Postadress:
</th>
<td>
Platteb....
<br/>44497 SVE....
</td>
</tr>
<tr>
<th>
Telefon:
</th>
<td>
01-.......
</td>
</tr>
</table>
我想抓住td 1
,td 2
和td 3
但是td 3
并不总是存在。
这是我到目前为止所得到的:
def ParsePage(threadName, page_url):
r = requests.get(page_url)
print "\n--------------------\n"
print "Parsing page: " + r.url
data = r.text
soup = BeautifulSoup(data)
divs = soup.findAll('div', { "class" : "content" })
for tag in divs:
divds = tag.findAll('td')
print divds
出于某种原因,这只会打印整个div
答案 0 :(得分:1)
你必须在某个地方输入错字,代码对我有用:
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html)
div = soup.findAll("div", {"class": "content"})
for tag in div: print tag.findAll("td")
#printed:
[<td>
Platteb....
<br/>44497 SVE....
</td>, <td>
01-.......
</td>]