Question

好的，所以这段代码有效：

from bs4 import BeautifulSoup
import urllib
import re

htmlfile = urllib.urlopen(MY SITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())

title = soup.find('p', {'class': 'deal-title should-truncate'}).getText()  
print "Title: " + str(title)

但上面的代码只给出了第一个结果。我希望能够为每个查找事件遍历整个站点。为此，我尝试使用全面的循环来查找每次出现一个数字标记（因为此段落标记始终位于图形标记之间）。这样我只能关注图中的内容。但是，当我尝试以下时：

from bs4 import BeautifulSoup
import urllib
import re

htmlfile = urllib.urlopen(MY WEBSITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())

deals = [figure for figure in soup.findAll('figure')]

for i in deals:
    title = i.find('p', {'class': 'deal-title should-truncate'}).getText()  
    print "Title: " + str(title)

我得到了这个错误：

Traceback（最近一次调用最后一次）：文件“C：\ Python27 \ blah.py”，行 11，在 title = i.find（'p'，{'class'：'deal-title should-truncate'}）。getText（）AttributeError：'NoneType'对象没有属性'getText'

现在我正在尝试：

from bs4 import BeautifulSoup import urllib import re

htmlfile = urllib.urlopen(MY SITE SITS HERE) soup = BeautifulSoup(htmlfile.read())

deals = soup.findAll('figure')

for i in deals:
    title = i.find('p', {'class': 'deal-title should-truncate'})
    if (title == None):
        title = "NONE"
    else:
        title = title.getText()
    print "Title: " + str(title)

现在错误是：

Traceback（最近一次调用最后一次）：文件“C：\ Python27 \ blah.py”，行 16，在 print“Title：”+ str（title）UnicodeEncodeError：'ascii'编解码器无法对位置12中的字符u'\ u2013'进行编码：序数不在范围（128）

Answer 1

BlackJack 的最终答案和特别关注

from bs4 import BeautifulSoup
import urllib
import re

htmlfile = urllib.urlopen(MY SITE SITS HERE)
soup = BeautifulSoup(htmlfile.read())

deals = soup.findAll('figure')

for i in deals:
    title = i.find('p', {'class': 'deal-title should-truncate'})
    if (title == None):
        title = "NONE"
    else:
        title = title.getText()
    print "Title: " + title

如何在python / BeautifulSoup中的列表元素上使用FIND（） - 我得到Nonetype错误

1 个答案: