我是python / scrapy的新手。我的问题与此问题相似,但我无法完全制定出有效的答案:
How Scrapy extract text inside class that inside attribute?
这是我的代码:
import scrapy
class IndeedSpider(scrapy.Spider):
name='indeed_jobs'
start_urls = ['https://www.indeed.com/q-Software-Engineer-l-Portland,-OR-jobs.html']
def parse(self, response):
next_page_outer = './/link[@rel="next"]'
next_page_url_outer = response.xpath(next_page_outer).get()
print(next_page_url_outer)
该代码产生:
<link rel="next" href="/jobs?q=Software+Engineer&l=Portland%2C+OR&start=10">
如何从此响应中包含的href中获取文本?谢谢!
答案 0 :(得分:0)
我可以回答我自己的问题。答案是:
next_page_url_href = response.xpath(next_page_outer).xpath("@href").extract()