我不知道我做错了什么。我正在尝试提取文本并将其存储在列表中。在萤火虫和火道中,当我进入路径时,它显示正确的正确文本。但是当我申请时,它返回空列表。 我正试图刮去www.insider.in/mumbai。它会转到所有链接并抓取事件标题,地址和其他信息。 这是我新编辑的代码:
public void
编辑输出:
ViewModel
即使if条件失败并打印RSVP。我似乎不明白我做错了什么。我被困在这部分3天。请帮忙。
答案 0 :(得分:1)
我删除了像webdriver这样的东西并获得了可行的基本代码
import scrapy
import logging
from scrapy.http import Request
from scrapy.selector import Selector
class insiderSpider(scrapy.Spider):
name = 'insider'
allowed_domains = ["insider.in"]
start_urls = ["http://www.insider.in/mumbai/"]
event_details = list() # Changed. Now event_detail is a menber data of class
def parse(self, response):
source_link = []
temp = []
title =""
Price = ""
Venue_name = ""
Venue_address = ""
description = ""
alllinks = response.xpath('//div[@class="bottom-details-right"]//a/@href').extract()
print alllinks
for single_event in alllinks:
if "https://insider.in/event" in single_event:
yield Request(url = single_event, callback = self.parse_event)
else:
print 'Other part'
def parse_event(self, response):
title = response.xpath('//div[@class = "cell-title in-headerTitle"]/h1//text()').extract()
print title
temp = response.xpath('//div[@class = "cell-caption centered in-header"]//h3//text()').extract()
print temp
a = len(response.xpath('//div[@class = "bold-caption price"]//text()').extract())
if a > 0:
Price = response.xpath('//div[@class = "bold-caption price"]//text()').extract()
else:
Price = "RSVP"
print Price
Venue_name = response.xpath('normalize-space(//div[@class = "address"]//div[@class = "section-title"]//text())').extract()
print Venue_name
Venue_address = response.xpath('normalize-space(//div[@class ="address"]//div//text()[preceding-sibling::br])').extract()
print Venue_address
description = response.xpath('normalize-space(//div[@class="cell-caption accordion-padding"]//text())').extract()
print description
self.event_details.append([title,temp,Price,Venue_name,Venue_address,description]) # Notice that event_details is used as self.event_details ie, using member data
print self.event_details # Here also self.event_details