【我想知道的:我如何修改产量代码SplashRequest(url,callback = self.parse,args = {“wait”:5},endpoint =“render.html”)】
我不知道我应该更改代码,以便在使用scrapy + splash时将A的方法中的URL传递给B.
在我看来,纠正的地方是,
产生SplashRequest(url,callback = self.parse,args = {“wait”:5},endpoint =“render.html”)
start_requests方法的(#1)。
有三个原因。
logging.info(#2)之前的“yield SplashRequest(url,callback = self.parse,args = {”wait“:5},endpoint =”render.html“)”给了我正确的反馈。
我设置了scrapy + splash,阅读Scrapy + Splash的README作为参考。
Scrapy + Splash for JavaScript集成 https://github.com/scrapy-plugins/scrapy-splash
日志记录方法(#3)没有给我任何信息。 这是我的代码。
# -*- coding: utf-8 -*-
import scrapy
from scrapy_splash import SplashRequest
from bnb_sentiment.items import BnbItem
from scrapy_splash import SplashRequest
import re
import logging
logging.basicConfig(level=logging.INFO)
# __name__ is name of a method
logger = logging.getLogger(__name__)
class BnbPriceTestSpider(scrapy.Spider):
name = 'bnb_price_test'
start_urls = [
# Tokyo--Japan
'https://www.airbnb.com/s/Tokyo--Japan/homes?refinement_paths%5B%5D=%2Fhomes&allow_override%5B%5D=&checkin=2018-07-07&checkout=2018-07-08&locale=en&min_beds=0&price_max=20000&price_min=10000&query=Tokyo%2C%20Japan&place_id=ChIJ51cu8IcbXWARiRtXIothAS4&s_tag=Mz88jJs1',
]
def start_requests(self):
for url in self.start_urls:
logger.info(url) #2
yield SplashRequest(url, callback=self.parse, args = {"wait": 5}, endpoint = "render.html") #1
def parse(self, response):
for href in response.xpath('//div[contains(@id, "listing-")]//a[contains(@href, "rooms")]/@href'):
import pdb; pdb.set_trace()
logger.info(href)
url = response.urljoin(href.extract())
import pdb; pdb.set_trace()
logger.info(url) #3
yield SplashRequest(url, callback=self.parse_scrape)
def parse_scrape(self, response):
pass
(#2)以下是记录方法的反馈。
/home/ubuntu/bnbsp/bnb_sentiment/bnb_sentiment/spiders/bnb_price.py(34)start_requests() - > logger.info(url)* 1 (Pdb)网址 'url与start_urls'相同