如何在python / BeautifulSoup中的列表元素上使用FIND() - 我得到Nonetype错误

时间:2014-07-18 20:37:23

标签: python for-loop web-scraping beautifulsoup list-comprehension

好的,所以这段代码有效:

from bs4 import BeautifulSoup
import urllib
import re

htmlfile = urllib.urlopen(MY SITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())

title = soup.find('p', {'class': 'deal-title should-truncate'}).getText()  
print "Title: " + str(title)

但上面的代码只给出了第一个结果。我希望能够为每个查找事件遍历整个站点。为此,我尝试使用全面的循环来查找每次出现一个数字标记(因为此段落标记始终位于图形标记之间)。这样我只能关注图中的内容。但是,当我尝试以下时:

from bs4 import BeautifulSoup
import urllib
import re

htmlfile = urllib.urlopen(MY WEBSITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())

deals = [figure for figure in soup.findAll('figure')]

for i in deals:
    title = i.find('p', {'class': 'deal-title should-truncate'}).getText()  
    print "Title: " + str(title)

我得到了这个错误:

  

Traceback(最近一次调用最后一次):文件“C:\ Python27 \ blah.py”,行   11,在       title = i.find('p',{'class':'deal-title should-truncate'})。getText()AttributeError:'NoneType'对象没有   属性'getText'

现在我正在尝试:

from bs4 import BeautifulSoup import urllib import re

htmlfile = urllib.urlopen(MY SITE SITS HERE) soup = BeautifulSoup(htmlfile.read())

deals = soup.findAll('figure')

for i in deals:
    title = i.find('p', {'class': 'deal-title should-truncate'})
    if (title == None):
        title = "NONE"
    else:
        title = title.getText()
    print "Title: " + str(title)

现在错误是:

  

Traceback(最近一次调用最后一次):文件“C:\ Python27 \ blah.py”,行   16,在       print“Title:”+ str(title)UnicodeEncodeError:'ascii'编解码器无法对位置12中的字符u'\ u2013'进行编码:序数不在   范围(128)

1 个答案:

答案 0 :(得分:0)

BlackJack 的最终答案和特别关注

from bs4 import BeautifulSoup
import urllib
import re

htmlfile = urllib.urlopen(MY SITE SITS HERE)
soup = BeautifulSoup(htmlfile.read())

deals = soup.findAll('figure')

for i in deals:
    title = i.find('p', {'class': 'deal-title should-truncate'})
    if (title == None):
        title = "NONE"
    else:
        title = title.getText()
    print "Title: " + title