从HTML读取元素 -

时间:2013-03-19 20:02:13

标签: python html parsing python-3.x html-table

我有以下HTML:

<tr style='background:#DDDDDD;'>
    <td><b>ASD</b></td>
    <td colspan='3'>1231</td>
</tr>

此元素不会在页面上重复,因此它是唯一的。我想把单元格的内容(1231)变成一些变量。我尝试使用HTML.parser但它不起作用

2 个答案:

答案 0 :(得分:0)

看看使用beautifulsoup很棒,

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(html) ## feed your html page to beautifulsoup

pleaseFind = soup.find(text="ASD")

whatINeed = pleaseFind.findNext('td')

print whatINeed.text

答案 1 :(得分:0)

你可以使用urllib2(你不需要安装任何新东西(至少对于Windows版本的Python)):http://docs.python.org/2/howto/urllib2.html

示例:

import urllib2
response = urllib2.urlopen('your URL')
html = response.read()
#html is a string containing everything on your page

#this line (it could be a bit cleaner) finds substring "<td colspan='3'>" and
#searches between it's position and the next "</td>"
pos=html.find("<td colspan='3'>")
print html[pos+len("<td colspan='3'>")+1:html.find("</td>", pos))]