如何通过使用scrapy提取与data-reactid属性关联的文本?

时间:2019-05-28 20:21:28

标签: python scrapy web-crawler

我试图从网站中提取“可重排”文本,但我无法通过使用CSS或xpath来实现。

The text I want to extract is marked as red

class VsSpider(scrapy.Spider):
    name = 'VS'
    allowed_domains = ['VitalSource.com']
    start_urls = ['https://www.vitalsource.com/products/abnormal-psychology-susan-nolen-hoeksema-v9781259765667']

def parse(self, response):
    selector = Selector(response=response)
    item = VitalsourceItem()
    item['Ebook_Title'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/h1/text()').extract()[1].strip()
    item['Ebook_SubTitle'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/div[1]/text()').get().strip()
    item['Ebook_Author'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/p/text()').extract()[0].strip()
    item['Ebook_ISBN'] = re.findall("\d+",response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/ul/li[3]/h2/text()').extract()[0].strip())
    item['Ebook_eISBN'] = re.findall("\d+",response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/ul/li[2]/h2/text()').extract()[0].strip())
    item['Ebook_Price'] = re.findall(r'-?\d+\.?\d*e?-?\d*?',response.xpath("//span[@itemprop='price']").get().strip())
    item['Ebook_Edition'] = response.xpath('//*[@id="content"]/div[1]/div[1]/div[1]/div/div[2]/ul/li[4]/text()').extract()[0].strip()
    item['Ebook_Format'] = response.xpath('/html[1]/body[1]/div[2]/main[1]/div[1]/div[1]/div[3]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/p[1]/text()').extract()
    print(item)
    return item

结果:

{'Ebook_Author': 'by: Susan Nolen-Hoeksema',
 'Ebook_Edition': 'Edition: \n            7th',
 'Ebook_Format': [],
 'Ebook_ISBN': ['9781259765667', '1259765660'],
 'Ebook_Price': ['19.60'],
 'Ebook_SubTitle': '',
 'Ebook_Title': 'Abnormal Psychology',
 'Ebook_eISBN': ['9781259578137', '1259578135']}

我从chrome中获得的css和xpath信息只能部分运行在scrapy中。我是Python入门用户。我花了几个小时来查找和调试错误。但是结果都是一样的:什么都没有。

0 个答案:

没有答案