Question

我必须解析此网页http://india.gov.in/topics/health-family-welfare/health

我应该得到我的代码在这里的标题，它根本不打印标题是什么错误

#!/usr/bin/env python
import urllib2
from mechanize import Browser
from BeautifulSoup import BeautifulSoup


import sys
import csv

mech = Browser()
url = "http://www.india.gov.in/topics/health-family-welfare/health"
page = mech.open(url)

html = page.read()
soup = BeautifulSoup(html)
div=soup.find("div",{"class":"view-content"})
lists = div.find('ol')
for ol in lists:
lis=ol.findAll('li')
print lis

print

她是我得到的错误

 File "srap_req.py", line 17, in <module>
lists = div.findAll('ol')
AttributeError: 'NoneType' object has no attribute 'findAll'

Answer 1

div为None，因为在前一行.find()调用未找到匹配的<div class="view-content">。你加载了一个没有这样一个div的页面。如果您正在解析多个网页，则必须考虑.find()次来电确实找不到您要查找的对象，而是返回None。

我强烈建议您在此处切换到BeautifulSoup 4。 BeautifulSoup 3现在已经超过2年没有看到任何新的错误修复版本，而BeautifulSoup 4允许你使用CSS queries来获得你想要的东西，这更容易：

from bs4 import BeautifulSoup

mech = Browser()
url = "http://www.india.gov.in/topics/health-family-welfare/health"
page = mech.open(url)

soup = BeautifulSoup(page)
for item in soup.select('div.view-content ol li'):
    print item

如果您使用的是mechanize，您可能还希望使用requests和BeautifulSoup来查看RoboBrowser这是一个更现代的重新实现：

from robobrowser import RoboBrowser

browser = RoboBrowser()
url = "http://www.india.gov.in/topics/health-family-welfare/health"
browser.open(url)

for item in browser.select('div.view-content ol li'):
    print item

AttributeError：'NoneType'对象没有属性'findAll'

1 个答案: