我的蜘蛛代码:
import scrapy
from scrapy.spider import BaseSpider
from selenium import webdriver
from youtube.items import YoutubeItem
from scrapy.http import TextResponse
from selenium.webdriver.common.keys import Keys
import time
from datetime import date
import re
import math
from scrapy.http import Request
class YoutubeSpider(BaseSpider):
name = "youtube"
allowed_domains = ['youtube.com']
device_name = "nexus 6 "
start_urls = "https://www.youtube.com/results?search_query="+device_name+"\"unboxing\""
other_urls = []
def __init__(self):
other_ulrs = []
self.driver = webdriver.Firefox()
def parse(self, response):
items = []
self.driver.get(self.start_urls)
d1 =self.driver.page_source.encode('utf-8')
html = str(d1)
response = TextResponse('none',200,{},html,[],None)
'''
my parse code......
'''
self.driver.close()
return items
但我的代码只能正确地为start_urls
工作,我有一个包含设备名称device_name.txt
的文件,我想发送这些名称并调用解析方法。
我在__init__()
内使用了这个:
other_urls = [l.strip() for l in open('device_name.txt').readlines()]
在返回parse():
内部之前我正在使用此if
来呼叫other_urls
,但它无效。
if self.other_urls:
return Request(self.other_urls.pop(0), meta={'items': [items]})
self.driver.close()
return items