好的,所以这段代码有效:
from bs4 import BeautifulSoup
import urllib
import re
htmlfile = urllib.urlopen(MY SITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())
title = soup.find('p', {'class': 'deal-title should-truncate'}).getText()
print "Title: " + str(title)
但上面的代码只给出了第一个结果。我希望能够为每个查找事件遍历整个站点。为此,我尝试使用全面的循环来查找每次出现一个数字标记(因为此段落标记始终位于图形标记之间)。这样我只能关注图中的内容。但是,当我尝试以下时:
from bs4 import BeautifulSoup
import urllib
import re
htmlfile = urllib.urlopen(MY WEBSITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())
deals = [figure for figure in soup.findAll('figure')]
for i in deals:
title = i.find('p', {'class': 'deal-title should-truncate'}).getText()
print "Title: " + str(title)
我得到了这个错误:
Traceback(最近一次调用最后一次):文件“C:\ Python27 \ blah.py”,行 11,在 title = i.find('p',{'class':'deal-title should-truncate'})。getText()AttributeError:'NoneType'对象没有 属性'getText'
现在我正在尝试:
from bs4 import BeautifulSoup import urllib import re
htmlfile = urllib.urlopen(MY SITE SITS HERE) soup = BeautifulSoup(htmlfile.read())
deals = soup.findAll('figure')
for i in deals:
title = i.find('p', {'class': 'deal-title should-truncate'})
if (title == None):
title = "NONE"
else:
title = title.getText()
print "Title: " + str(title)
现在错误是:
Traceback(最近一次调用最后一次):文件“C:\ Python27 \ blah.py”,行 16,在 print“Title:”+ str(title)UnicodeEncodeError:'ascii'编解码器无法对位置12中的字符u'\ u2013'进行编码:序数不在 范围(128)
答案 0 :(得分:0)
BlackJack 的最终答案和特别关注
from bs4 import BeautifulSoup
import urllib
import re
htmlfile = urllib.urlopen(MY SITE SITS HERE)
soup = BeautifulSoup(htmlfile.read())
deals = soup.findAll('figure')
for i in deals:
title = i.find('p', {'class': 'deal-title should-truncate'})
if (title == None):
title = "NONE"
else:
title = title.getText()
print "Title: " + title