我正在尝试浏览连续的页面,后缀以20为增量(基于每个页面中列表的数量)增加
第一页是:https://www.daft.ie/dublin-city/property-for-sale/dublin-4/
第二个是:https://www.daft.ie/dublin-city/property-for-sale/dublin-4/?offset=20
第10页是:https://www.daft.ie/dublin-city/property-for-sale/dublin-4/?offset=180
我已经检查了缩进,看起来不错,但只返回20个列表的第一页 这是spider.py文件,非常感谢您提供任何建议
import scrapy
class DaftieSpiderSpider(scrapy.Spider):
name = 'daftie_spider'
page_number = 20
allowed_domains = ['https://www.daft.ie/dublin-city/property-for-sale/dublin-4/']
start_urls = ['https://www.daft.ie/dublin-city/property-for-sale/dublin-4/']
def parse(self, response):
listings = response.xpath('//div[@class="PropertyCardContainer__container"]')
for listing in listings:
price = listing.xpath('.//a/strong[@class="PropertyInformationCommonStyles__costAmountCopy"]/text()').extract_first()
address = listing.xpath('.//*[@class="PropertyInformationCommonStyles__addressCopy--link"]/text()').extract_first()
bedrooms = listing.xpath('.//*[@class="QuickPropertyDetails__iconCopy"]/text()').extract_first()
bathrooms = listing.xpath('.//*[@class="QuickPropertyDetails__iconCopy--WithBorder"]/text()').extract_first()
prop_type = listing.xpath('.//*[@class="QuickPropertyDetails__propertyType"]/text()').extract_first()
agent = listing.xpath('.//div[@class="BrandedHeader__agentLogoContainer"]/img/@alt').extract_first()
yield{'price': price,
'address': address,
'bedrooms': bedrooms,
'bathrooms': bathrooms,
'prop_type': prop_type,
'agent': agent}
next_page = 'https://www.daft.ie/dublin-city/property-for-sale/dublin-4/?offset=' + str(DaftieSpiderSpider.page_number)
if DaftieSpiderSpider.page_number <= 180:
DaftieSpiderSpider.page_number += 20
yield response.follow(next_page, callback=self.parse)
答案 0 :(得分:1)
不确定是否是由于格式问题,但是您正在列表循环中将值增加20。无论如何,我都会尝试不适应这样的类变量。
以下对我来说效果更好:
import scrapy
class DaftieSpiderSpider(scrapy.Spider):
name = 'daftie_spider'
page_number = 20
allowed_domains = ['daft.ie']
start_urls = ['https://www.daft.ie/dublin-city/property-for-sale/dublin-4/']
def parse(self, response):
offset = response.meta.get('offset', 0)
listings = response.xpath('//div[@class="PropertyCardContainer__container"]')
for listing in listings:
price = listing.xpath('.//a/strong[@class="PropertyInformationCommonStyles__costAmountCopy"]/text()').extract_first()
address = listing.xpath('.//*[@class="PropertyInformationCommonStyles__addressCopy--link"]/text()').extract_first()
bedrooms = listing.xpath('.//*[@class="QuickPropertyDetails__iconCopy"]/text()').extract_first()
bathrooms = listing.xpath('.//*[@class="QuickPropertyDetails__iconCopy--WithBorder"]/text()').extract_first()
prop_type = listing.xpath('.//*[@class="QuickPropertyDetails__propertyType"]/text()').extract_first()
agent = listing.xpath('.//div[@class="BrandedHeader__agentLogoContainer"]/img/@alt').extract_first()
yield{'price': price,
'address': address,
'bedrooms': bedrooms,
'bathrooms': bathrooms,
'prop_type': prop_type,
'agent': agent}
if offset <= 180:
offset += 20
next_page = 'https://www.daft.ie/dublin-city/property-for-sale' \
'/dublin-4/?offset=' + str(offset)
yield response.follow(next_page,
callback=self.parse,
meta={'offset': offset})
答案 1 :(得分:0)
起作用的最终代码: 非常感谢您的帮助
<ol>
{
items.map(todo => (
<li key={todo.taskId} className={todo.completed ? 'active' : 'inactive'}>
<span onClick={() => dispatch(updateTodo())}>{todo.task}</span>
<div className='hidden updatePanel'>
<input type='text' value={todo.task}/>
<input type='checkbox' checked={todo.completed}></input>
</div>
</li>
))
}
</ol>