响应未在Scrapy解析函数中定义

时间:2016-02-29 14:04:54

标签: python selenium scrapy

我正在尝试与Selenium一起编写一个Scrapy蜘蛛来访问我正在抓取的页面上的一些JavaScript内容。我已设法使用Selenium打开页面并等待内容出现。现在我想从完全加载的页面构建一个Scrapy TextResponse。我的代码看起来像这样(我删除了URL和选择器字符串,它们无关紧要):

import scrapy
from scrapy import signals
from scrapy.http import TextResponse 
from scrapy.xlib.pydispatch import dispatcher

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

class EexSpider(scrapy.Spider):
    name = "eex"
    allowed_domain = ["..."]
    start_urls = ["..."]

    def __init__(self):
        self.driver = webdriver.Chrome()
        dispatcher.connect(self.spider_closed, signals.spider_closed)

    def spider_closed(self, spider):
        self.driver.close()

    def parse(self, response):
        self.driver.get(response.url)
        wait = WebDriverWait(self.driver, 10)
        element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '...')))

        # this is where things go wrong
        print response.url # prints the correct url
        text_response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')
        # NameError: name 'response' is not defined

当我运行爬虫时,我在调用NameError: name 'response' is not defined构造函数的行中收到错误TextResponse。奇怪的是,我能够成功地在之前的行中打印response.url

有人知道为什么会这样吗?

P.S。让我知道如果你想看到堆栈跟踪,我只是不想让问题显得更长。

免责声明:我是一个完整的Python菜鸟; - )

1 个答案:

答案 0 :(得分:1)

检查包含TextResponse的行是否正确缩进。

例如,如果我有以下代码:

import scrapy
from scrapy import signals
from scrapy.http import TextResponse 
from scrapy.xlib.pydispatch import dispatcher

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

class EexSpider(scrapy.Spider):
    name = "eex"
    allowed_domain = ["google.com"]
    start_urls = ["http://google.com"]

    def __init__(self):
        self.driver = webdriver.Chrome()
        dispatcher.connect(self.spider_closed, signals.spider_closed)

    def spider_closed(self, spider):
        self.driver.close()

    def parse(self, response):
        self.driver.get(response.url)

    text_response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8')

我得到完全相同的错误:

  

NameError:name' response'未定义