我用parse()
创建了这个类:
class PitchforkSpider(scrapy.Spider):
name = "pitchfork_reissues"
allowed_domains = ["pitchfork.com"]
#creates objects for each URL listed here
start_urls = [
"http://pitchfork.com/reviews/best/reissues/?page=1",
"http://pitchfork.com/reviews/best/reissues/?page=2",
"http://pitchfork.com/reviews/best/reissues/?page=3",
]
def parse(self, response):
items = []
for sel in response.xpath('//div[@class="album-artist"]'):
item = PitchforkItem()
item['artist'] = sel.xpath('//ul[@class="artist-list"]/li/text()').extract()
item['reissue'] = sel.xpath('//h2[@class="title"]/text()').extract()
items.append(item)
return items
从其他脚本,我导入上述module
所属的class
:
from blogs.spiders.pitchfork_reissues_feed import *
并且,实例化class
,我尝试调用parse()
方法:
def reissues():
pitchfork_reissues = PitchforkSpider()
albums = pitchfork_reissues.parse(response)
print (albums)
但是我收到以下错误:
reissues = pitchfork_reissues.parse(response)
NameError: global name 'response' is not defined
显然,parse()
方法需要scrapy.http.Response
的实例。
如何在reissues()
内的第二个脚本的上下文中创建此类实例?
答案 0 :(得分:0)
from scrapy.http import Response
response = Response(body=u'html here')
现在我认为你不能以这种方式抓取,因为它不是Scrapy应该如何工作,但你仍然可以创建Response对象