用Python自动分页

时间:2016-12-13 23:24:45

标签: python instagram

我正在尝试自动化我的脚本(在Python中)以自动连续获取end_cursor。例如:

https://www.instagram.com/explore/tags/plebiscito/?__a=1

后:

https://www.instagram.com/explore/tags/plebiscito/?__a=1&max_id=J0HWFB4fAAAAF0HWE8Y4AAAAFiYA

后:

https://www.instagram.com/explore/tags/plebiscito/?__a=1&max_id=J0HWFB4fAAAAF0HWE2jPAAAAFkwA

.... .... ....

并执行此操作,直到最后一个end_cursor结束。 如果你能帮助我,我将不胜感激,因为我不能。再次感谢你。

PD:我没有使用API​​,因为Sandbox不允许应用程序进行开发的问题。

更新:End_cursor在输入链接时加载的所有内容

1 个答案:

答案 0 :(得分:6)

因此,https://www.instagram.com/explore/tags/plebiscito/?__a=1会返回一堆以

开头的JSON
{"tag": {"media": {"count": 18926, "page_info": {"has_previous_page": false, "start_cursor": "1404693250132394506", "end_cursor": "J0HWFCHOgAAAF0HWE8dgwAAAFiYA", "has_next_page": true}, "nodes": [{"code": "BN-eRGQh8IK", "dimensions": {"width": 750, "height": 538}, "comments_disabled": false, "owner": {"id": "311016089"}, "comments": {"count": 1}, "caption": "#plebiscito", "likes": {"count": 11}, "date": 1481672506, "thumbnail_src": "https://scontent.cdninstagram.com/t51.2885-15/s640x640/sh0.08/e35/c147.0.750.750/15338447_1774364399481982_8165079596765544448_n.jpg?...

解析JSON后,您可以抓取end_cursor

end_cursor = data['tag']['media']['page_info']['end_cursor']

然后检索下一个网址。

在手动执行此操作的几秒钟内,我无法到达列表的末尾,因此我不知道最后end_cursor会发生什么。但我确实注意到has_next_page键。也许是这样的话,那么:

data = json.loads(however_youre_getting_the_data('https://www.instagram.com/explore/tags/plebiscito/?__a=1'))
end_cursors = []
while data['tag']['media']['page_info']['has_next_page']:
    end_cursors.append(data['tag']['media']['page_info']['end_cursor'])
    data = json.loads(however_youre_getting_the_data('https://www.instagram.com/explore/tags/plebiscito/?__a=1&max_id={}'.format(end_cursors[-1])))