应用错误收集

scrapy无法抓取页面中的所有链接

时间：2016-02-09 23:52:31

标签： python shell xpath scrapy

我正在尝试 scrapy 抓取ajax网站http://play.google.com/store/apps/category/GAME/collection/topselling_new_free

我想获得指向每场比赛的所有链接。

我检查页面的元素。它看起来像这样： how the page looks like 所以我想用pattern / store / apps / details提取所有链接？id =

但是当我在shell中运行命令时，它什么都不返回： shell command

我也试过// a / @ href。也没有成功，但不知道发生了什么事......

现在我可以通过修改starturl来抓取前120个链接，并添加'formdata'，因为有人告诉我，但之后没有更多链接。

有人可以帮我这个吗？

1 个答案:

答案 0 :(得分：1)

它实际上是ajax-post-request，用于填充该页面上的数据。在scrapy shell中，你不会得到这个，而不是检查元素检查network选项卡，你会找到请求。

向https://play.google.com/store/apps/category/GAME/collection/topselling_new_free?authuser=0网址发帖请求 formdata={'start':'0','num':'60','numChildren':'0','ipf':'1','xhr':'1'}

每个请求的增量开始为60，以获得分页结果。