我正在尝试从https://www.rawson.co.za/property/for-sale/cape-town提取图像的所有URL的列表。 但是,所有图像都可以在不同的页面上找到,而不是在主页面上。 我一直在使用xpath检索其他所需字段。
我不太确定如何从那些子页面中检索列表中的所有URL。这是我尝试过的:
class PropDataSpider(scrapy.Spider):
name = "rawson"
start_urls = ['https://www.rawson.co.za/property/for-sale/cape-town']
def parse(self, response):
propertes = response.xpath("//div[@class='card__main']")
for prop in propertes:
title = prop.xpath(
"./div[@class='card__body']/h3[@class='card__title']/a/text()").extract_first()
price = prop.xpath(
"./div[@class='card__body']/div[@class='card__footer card__footer--primary']/div[@class='card__price']/text()").extract_first()
description = prop.xpath(
"./div[@class='card__body']/div[@class='card__synopsis']/p/text()").extract_first()
bedrooms = prop.xpath(
"./div[@class='card__body']/div[@class='card__footer card__footer--primary']/div[@class='features features--inline']/ol[@class ='features__list']/li[@class ='features__item'][1]/div[@class='features__label']/text()").extract_first()
...
images = ['https://' + img for img in prop.xpath(
"main[@class='l-main']/section[@class='l-section']/div[@class='l-wrapper']/div[@class='l-section__main']/div[@class ='content-block content-block--flat']/div[@class ='gallery gallery--flat js-lightbox']/div[@ class ='row row--flat']/div[@class ='col']/a[@class ='gallery__link js-lightbox-image']/img/@src")]
yield {'title': title, 'price':price, "description": description, 'bedrooms': bedrooms, 'bathrooms': bathrooms, 'garages': garages, 'images':images}
但是此代码确实检索了图像的“无”,这是有道理的,但是我不确定该如何处理。如果有人提出建议,将不胜感激。预先谢谢你!
答案 0 :(得分:0)
您需要使用new Comparator<Point>(){
public int compare(Point p1,Point p2){
...
}
}
:
new List<X>();