我正在尝试抓取页面,但是在该页面中,我需要多次按下按钮来加载所有内容,这就是为什么我在分析硒并提取链接之前使用硒的原因。
下面是错误,我在做什么错了?
2018-08-31 20:18:56 [twisted] CRITICAL:
Traceback (most recent call last):
File "d:\python-projects\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "d:\python-projects\lib\site-packages\scrapy\crawler.py", line 81, in crawl
start_requests = iter(self.spider.start_requests())
TypeError: 'NoneType' object is not iterable
我的代码:
import scrapy
from scrapy.selector import Selector
from scrapy.spider import Spider
from scrapy.utils.markup import remove_tags
from selenium import webdriver
class Listings(Spider):
name = "adver"
base_url = 'https://www.test.com/xxxxx1'
def start_requests(self):
self.driver = webdriver.Firefox(executable_path=r'D:\python-projects\geckodriver.exe')
self.driver.get(self.base_url)
while True:
load_content = self.driver.find_element_by_xpath('/html/body/div[5]/div[3]/div[1]/button')
try:
self.parse(driver.page_source)
load_content.click()
except:
break
self.driver.close()
def parse(self, response):
for link in response.css ("a.ad-title-link"):
ad_link = link.css('a::attr(href)').extract_first()
yield {'link': ad_link}
答案 0 :(得分:0)