我试图抓取抓取网站https://www.kalkhoff-bikes.com/的所有产品名称。但是结果比预期的要低。我做错了什么? 我的第一次尝试是:
import scrapy
class ToScrapeSpider(scrapy.Spider):
name = 'Kalkhoff_1'
start_urls = [
'https://www.kalkhoff-bikes.com/'
]
allowed_domains = [
'kalkhoff-bikes.com'
]
def parse(self, response):
for item in response.css('ul.navMain__subList--sub > li.navMain__subItem'):
yield {
'Name': item.css("span.navMain__subText::text").get()
}
for href in response.css('li.navMain__item a::attr(href)'):
yield response.follow(href, self.parse)
在我读完之后,如果有动态内容,那么解决方案应该是很不错的。所以我尝试了这个:
import scrapy
from scrapy_splash import SplashRequest
class ToScrapeSpider(scrapy.Spider):
name = 'Kalkhoff_2'
start_urls = [
'https://www.kalkhoff-bikes.com/'
]
allowed_domains = [
'kalkhoff-bikes.com'
]
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url, self.parse,
endpoint='render.html',
args={'wait':0.5},
)
def parse(self, response):
for item in response.css('ul.navMain__subList--sub > li.navMain__subItem'):
yield {
'Name': item.css("span.navMain__subText::text").get()
}
for href in response.css('li.navMain__item a::attr(href)'):
yield response.follow(href, self.parse)
不幸的是,我没有得到所有的产品名称。我在正确的轨道上吗?