我正试图从seatgeek抓取票信息,但我正在努力这样做。当我运行我的代码时,我得到了这个:
INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
我的想法是,我会输入节目/活动的名称,scrapy会刮掉节目的每个表演的网址,然后刮取票价等。我的代码如下:
import scrapy
from seatgeek import items
class seatgeekSpider(scrapy.Spider):
name = "seatgeek_spider"
showname = input("Enter Show name (lower case please): ")
showname = showname.replace(' ', '-')
start_urls = "https://seatgeek.com/" + showname + "-tickets.html"
def parse_performance(self, response):
for href in response.xpath('//a[@class="event-listing-title"]/@href').extract():
yield scrapy.Request(
url= 'https://seatgeek.com/' + href,
callback=self.parse_ticketinv,
method="POST",
meta={'url': href})
def parse_ticketinv(self, response):
price = response.xpath('//span[@class="omnibox__listing__buy__price"]').extract()
performance = response.xpath('//div[@class="event-detail-words faint-words"]/text()').extract()
quantity = response.xpath('//div[@class="omnibox__seatview__availability"]/text()').extract()
seatinfo = response.xpath('//div[@class="omnibox__listing__section"]/text()').extract()
# creating scrapy items
item = items.seatgeekItem()
item['price'] = price
item['performance'] = performance
item['quantity'] = quantity
item['seatinfo'] = seatinfo
yield item
这是我的items.py代码:
import scrapy
class SeatgeekItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
price = scrapy.Field()
performnace = scrapy.Field()
quantity = scrapy.Field()
seatinfo = scrapy.Field()
非常感谢任何帮助 - 谢谢!
答案 0 :(得分:1)
我可以看到两个直接的问题:
start_urls
应该是一个列表;你应该看到这样的错误:
Traceback (most recent call last):
(...)
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: h
默认情况下,start_urls
中用于网址的回调是parse()
,这在您的代码中未定义。也许您应该重命名parse_performance()
方法?
此外,spider arguments是获取用户输入的更常见方式。