Question

我正在做一个小脚本，我想收集所有的＃34;代码：＆＃34;关于标签。

例如：

https://www.instagram.com/explore/tags/%s/?__a=1

下一页将是：

https://www.instagram.com/explore/tags/plebiscito/?__a=1&max_id=的 end_cursor

然而，我的缺点是让每个网址都能得到我所需要的东西（这是人们的评论和用户名）。因此，当脚本工作时，它不能满足我的需要。

＆＃34; obtain_max_id＆＃34;函数工作，获取以下end_cursors，但我不知道如何适应它。感谢您的帮助！

总之，我需要调整＆＃34; obtain_max_id＆＃34;功能在我的＆＃34; connect_main＆＃34;用于提取每个URL所需的信息。

Answer 1

这很简单。

import requests
import json

host = "https://www.instagram.com/explore/tags/plebiscito/?__a=1"

r = requests.get(host).json()

for x in r['tag']['media']['nodes']:
   print (x['code'])

next = r['tag']['media']['page_info']['end_cursor']

while next:
   r = requests.get(host + "&max_id=" + next ).json()
   for x in r['tag']['media']['nodes']:
      print (x['code'])

   next = r['tag']['media']['page_info']['end_cursor']

Answer 2

执行该行后，您就可以在data变量（JSON格式）中获得所需的所有数据：

data = json.loads(finish.text)

在while方法中的obtain_max_id()循环中

。只需使用它。

假设else方法的connect_main()块内的所有内容都有效，您可以在while循环中使用该代码，就在您拥有data循环中的所有数据之后1}}变量。

如何在Python（Instagram）中将此脚本与此功能集成

2 个答案: