我试图从网站中提取“可重排”文本,但我无法通过使用CSS或xpath来实现。
The text I want to extract is marked as red
class VsSpider(scrapy.Spider):
name = 'VS'
allowed_domains = ['VitalSource.com']
start_urls = ['https://www.vitalsource.com/products/abnormal-psychology-susan-nolen-hoeksema-v9781259765667']
def parse(self, response):
selector = Selector(response=response)
item = VitalsourceItem()
item['Ebook_Title'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/h1/text()').extract()[1].strip()
item['Ebook_SubTitle'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/div[1]/text()').get().strip()
item['Ebook_Author'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/p/text()').extract()[0].strip()
item['Ebook_ISBN'] = re.findall("\d+",response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/ul/li[3]/h2/text()').extract()[0].strip())
item['Ebook_eISBN'] = re.findall("\d+",response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/ul/li[2]/h2/text()').extract()[0].strip())
item['Ebook_Price'] = re.findall(r'-?\d+\.?\d*e?-?\d*?',response.xpath("//span[@itemprop='price']").get().strip())
item['Ebook_Edition'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/ul/li[4]/text()').extract()[0].strip()
item['Ebook_Format'] = response.xpath('/html[1]/body[1]/div[2]/main[1]/div[1]/div[1]/div[3]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/p[1]/text()').extract()
print(item)
return item
结果:
{'Ebook_Author': 'by: Susan Nolen-Hoeksema',
'Ebook_Edition': 'Edition: \n 7th',
'Ebook_Format': [],
'Ebook_ISBN': ['9781259765667', '1259765660'],
'Ebook_Price': ['19.60'],
'Ebook_SubTitle': '',
'Ebook_Title': 'Abnormal Psychology',
'Ebook_eISBN': ['9781259578137', '1259578135']}
我从chrome中获得的css和xpath信息只能部分运行在scrapy中。我是Python入门用户。我花了几个小时来查找和调试错误。但是结果都是一样的:什么都没有。