我正在学习制作蜘蛛,并一直试图找出这个小虫子。任何帮助,将不胜感激。谢谢。
当我运行我的蜘蛛时,我收到一个如此陈述的错误:
KeyError:' SoapguildItem不支持字段:url'
以下是我一直在处理的代码:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from soapguild.items import SoapguildItem
class SoapySpider(CrawlSpider):
name = 'soapy'
allowed_domains = ['soapguild.org']
start_urls = ['http://www.soapguild.org/']
rules = (
Rule(LinkExtractor(), callback='parse_item', follow=True),
)
def parse_item(self, response):
href = SoapguildItem()
href['url'] = response.url
# Email
email = response.xpath("//div/div[1]/p[2]/a[1]/@href").extract()
email = email.replace("mailto:", "")
#email = email.replace("(at)". "@")
location = response.xpath("//div/div[1]/p[1]/text()[2]").extract()
#location
location = response.xpath("//div/div[1]/p[1]/text()[2]").extract()
#contact
contact = response.xpath("//div/div[1]/p[2]/text()[1]").extract()
contact = contact.replace("Contact: ", "")
#website
website = response.xpath("//div/div[1]/p[2]/a[2]//@href").extract()
for item in zip(email,location,contact,website):
scraped_info = {
'Email' : item[0],
'Location' : item[1],
'Contact' : item[2],
'Website' : item[3]
}
yield scraped_info
答案 0 :(得分:1)
您是否在items.py中添加了url作为字段?我认为错误来自:href ['url']
答案 1 :(得分:1)
你的物品档案“SoapguildItem”不包含名为成员变量的url,请定义url。
Mon Jan 29 2018 11:01:06 GMT+0100 (W. Europe Standard Time)
答案 2 :(得分:0)
people_item = PeoplItem();
people_item.__class__.table_name='people_20216'