如何在__init __()里面使用other_urls = []来存储多个网址,然后调用parse()

时间:2015-09-23 06:50:27

标签: python selenium scrapy

我的蜘蛛代码:

import scrapy
from scrapy.spider import BaseSpider
from selenium import webdriver
from youtube.items import YoutubeItem
from scrapy.http import TextResponse
from selenium.webdriver.common.keys import Keys
import time
from datetime import date
import re
import math
from scrapy.http import Request


class YoutubeSpider(BaseSpider):
    name = "youtube"
    allowed_domains = ['youtube.com']

    device_name = "nexus 6 "
    start_urls = "https://www.youtube.com/results?search_query="+device_name+"\"unboxing\""
    other_urls = []

def __init__(self):
    other_ulrs = []
    self.driver = webdriver.Firefox()

def parse(self, response):
    items = []
    self.driver.get(self.start_urls)

    d1 =self.driver.page_source.encode('utf-8')
    html = str(d1)
    response = TextResponse('none',200,{},html,[],None)

    '''
    my parse code......
    '''

    self.driver.close()
    return items

但我的代码只能正确地为start_urls工作,我有一个包含设备名称device_name.txt的文件,我想发送这些名称并调用解析方法。

我在__init__()内使用了这个:

other_urls = [l.strip() for l in open('device_name.txt').readlines()]

在返回parse():内部之前我正在使用此if来呼叫other_urls,但它无效。

if self.other_urls:
    return Request(self.other_urls.pop(0), meta={'items': [items]})

self.driver.close()
return items

0 个答案:

没有答案