我有这段代码:
url = 'http://www.topsoftzone.com/program/12721/Windows_Phone_7.html'
pageurl = urllib.urlopen(url)
soup = BeautifulSoup
print soup.find('table',{'class':'download_tab'}).find('td',{'width':'55%'}).find('strong').text
我应该得到这样的输出:09/29/2011(已提交:09/08/2011)
但代码输出:已更新:
答案 0 :(得分:2)
我猜你错过了tr
和table
td
的表格行
无论如何,请考虑使用带xpath的lxml
from lxml import etree
tree = etree.parse(url, etree.HTMLParser())
l = tree.xpath('//table[@class="download_tab"]/tr/td[@width="55%"]/text()')
print l[1]
09/29/2011 (Submitted: 09/08/2011)
编辑:未按要求提供lxml
soup = BeautifulSoup(pageurl)
l = soup.find('table',{'class':'download_tab'}).find('tr').find('td',{'width':'55%'}).findAll(text=True)
print l[2]
09/29/2011 (Submitted: 09/08/2011)
答案 1 :(得分:1)
您需要更多错误检查,但这可行
import lxml.html
import urllib
import sys
link = "http://www.topsoftzone.com/program/12721/Windows_Phone_7.html"
page = urllib.urlopen(link).read()
doc = lxml.html.document_fromstring(page)
doc.make_links_absolute(link)
found_text = doc.xpath(u".//table[@class='download_tab']/tr/td[@width='55%']/text()")
try:
print found_text[1].strip()
except:
print "Not found"