Question

我正在尝试使用Scrapy和xpath从网站上抓取数据，但我遇到了一些麻烦。这是我的代码：

class MaijiaSpider(scrapy.Spider):
    name = 'maijiaSpider'
    start_urls =["http://www.maijia.com/index.html#/item/list/?keyword=recaro"]

    def parse(self, response):
        articles = response.xpath("//table[@class='ui-table ui-table-striped ui-table-inbox tablefixed']//tr[1]/td[2]/div/div[1]/a/@href")
        for article in articles:
            yield{
                'link': article.xpath('.//td[2]//a/@href').extract_first() 
            }

问题是文章总是空的，因此它永远不会进入for循环。我究竟做错了什么？我尝试使用不同的xpath字符串，但似乎没有任何工作。

Answer 1

此页面使用JavaScript获取数据，数据的URL为：

http://www.maijia.com/data/item/list?api_name=item_get_list&type=ALL&pageNo=1&pageSize=10&keyword=recaro&sortField=amount30&sortType=desc

您可以在Chrome开发工具中找到此网址

使用Scrapy和Xpath刮取数据

1 个答案: