我想按照我已经删除的链接来获取更多详细信息。
例如,来自here,其中包含所有职位。
我希望转到其中一个链接,例如here来提取作业说明。
以下是获取标题链接和日期以及将其插入CSV文件的工作代码。
class MySpider(BaseSpider):
name = "craigslist"
allowed_domains = ["singapore.craigslist.com.sg"]
start_urls = ["https://singapore.craigslist.com.sg/d/jobs/search/jjj"]
def parse(self, response):
item = SampleItem()
item["title"] = response.xpath('//*[@class="result-info"]/a/text()').extract()
item["link"] = response.xpath('//*[@class="result-info"]/a/@href').extract()
item["date"] = response.xpath('//*[@class="result-info"]/time[@class="result-date"]/text()').extract()
for i in range(len(item["title"])):
yield {"Title": item['title'][i], "Link": item['link'][i], "Date": item['date'][i]}
这是我尝试转到该链接,但它没有成功。
class MySpider(BaseSpider):
name = "craigslist"
allowed_domains = ["singapore.craigslist.com.sg"]
start_urls = ["https://singapore.craigslist.com.sg/d/jobs/search/jjj"]
BASE_URL = 'https://singapore.craigslist.com.sg'
def parse(self, response):
links = response.xpath('//*[@class="result-info"]/a/@href').extract()
item = SampleItem()
item["title"] = response.xpath('//*[@class="result-info"]/a/text()').extract()
item["date"] = response.xpath('//*[@class="result-info"]/time[@class="result-date"]/text()').extract()
for i in range(len(item["title"])):
yield {"Title": item['title'][i], "Date": item['date'][i]}
for link in links:
absolute_url = self.BASE_URL + link
yield BaseSpider.Request(absolute_url, callback=self.parse_attr)
def parse_attr(self, response):
item = SampleItem()
item["description"] = response.xpath('//*[@id="postingbody"]/text()').extract()
for i in range(len(item["description"])):
yield {"Description" : item["description"]}
知道怎么做吗? Log of scraper