Question

Python 2.7

我想获取每个新图片的背景图片网址和标题，但是当尝试获取图片网址时，我使用xpath总是获取空数组。

这是我尝试的方法：

scrapy shell http://www.wownews.tw/fashion/movie

然后

response.body

我可以在终端上看到html数据。但是当我键入

response.xpath('//div[@class="text ng-scope"]')

获取空数组，我认为应该可以。

是否因为类包含空格而发生问题？

如何解决？任何帮助将不胜感激。

我尝试命令仍然获取空数组

response.xpath('//div[contains(concat(" ", normalize-space(@class), " "), "text ng-scope")]')

Answer 1

这就是您需要的一切

import json
import scrapy


class ListingSpider(scrapy.Spider):
    name = 'listing'

    start_urls = ['http://api.wownews.tw/f/pages/site/558fd617913b0c11001d003d?category=5590a6a3f0a8bf110060914d&children=true&limit=48&page=1']

    def parse(self, response):
        items = json.loads(response.body)['results']

        for item in items:
            yield item

请参阅https://medium.com/@yashpokar/scrape-any-website-in-the-internet-without-using-splash-or-selenium-68a6c9733369

类包含空格时获取空数组

1 个答案: