为什么一个网络抓取搜索正确返回而其他人没有返回?

时间:2017-08-17 18:20:48

标签: python-2.7 web-scraping

我知道这应该很简单,这就是令人沮丧的原因。我搜索了与此类似的问题,我确实知道“AttributeError:'NoneType'对象没有属性'find'”的意思。

我正在制作一个网页抓取脚本,该脚本从this网站获取公司标题,名称等。令人困惑的是,两个搜索功能,公司名称和主要联系人工作完美,而任何公司名称,电子邮件和电话号码都没有。即使所有这些都存在,并且结构与有效搜索的结构相同。谁能指出这些有什么不同?

我的代码在这里,前两个搜索工作,但第三个返回None。我是python的新手,任何帮助都会受到赞赏。

# import libraries
import urllib2  
from bs4 import BeautifulSoup
import csv  
from datetime import datetime 

data = []
for i in range (0, 4):
    quote_page = 'http://www.homeopathycenter.org/professional-and-organizational-directory' + '?field_geofield_distance%5Bdistance%5D=50&field_geofield_distance%5Bunit%5D=3959&field_geofield_distance%5Borigin%5D=&field_professional_category_tid=All&combine=&field_address_locality=&field_address_administrative_area=&field_address_country=All&field_consultations_by_phone_value=All&field_consultations_online_value=All&field_animal_consultations_value=All&page={}'.format(i)

    page = urllib2.urlopen(quote_page) 

    # parse the html using beautiful soup and store in variable `soup`
    soup = BeautifulSoup(page, 'html.parser')

    #identify full contact box
    box_array = ['views-row views-row-1 views-row-odd views-row-first',
                'views-row views-row-2 views-row-even',
                'views-row views-row-3 views-row-odd',
                'views-row views-row-4 views-row-even',
                'views-row views-row-5 views-row-odd',
                'views-row views-row-6 views-row-even',
                'views-row views-row-7 views-row-odd',
                'views-row views-row-8 views-row-even',
                'views-row views-row-9 views-row-odd',
                'views-row views-row-10 views-row-even views-row-last']

    #loop the boxes
    for box in box_array:
        full_box = soup.find('div', {'class': box})

        #get title from box
        title_box = full_box.find('div', {'class': 'views-field views-field-title'})
        title_content = title_box.find('span', {'class': 'field-content'})
        title = title_content
        title = title.text.strip() # strip() is used to remove starting and trailing

        #get contact name from box, this works
        contact_box = full_box.find('div', {'class': 'views-field views-field-field-primary-contact'})
        contact_content = contact_box.find('div', {'class': 'field-content'})
        name = contact_content, this works
        name = name.text.strip()

        #get company name from box, this returns none
        company_box = full_box.find('div', {'class': 'views-field views-field-field-company-name'})
        company_content = company_box.find('div', {'class': 'field-content'})
        company = company_content.text.strip()

编辑:错误代码如下:

追踪(最近一次通话):   文件“F:\ Program Files(x86)\ Python \ sc3.py”,第46行,in     company_content = company_box.find('div',{'class':'field-content'}) AttributeError:'NoneType'对象没有属性'find'

0 个答案:

没有答案