我编写了一个小型Python程序,用于抓取Instagram个人资料以提取数据并显示各种统计数据。我可以从配置文件中的前9张照片中收集数据(或者在初始加载时出现多张照片),但我还没有能够加载其他照片(由于无限滚动机制)。我已经在线阅读了关于无限滚动的网页抓取,人们说你需要复制加载其他图片的请求。到目前为止,我一直无法复制请求,是否有人能够提供帮助?
谢谢!
答案 0 :(得分:1)
无需再次编写所有代码,已经编写了许多库来复制所有请求。
一个这样的库是https://github.com/ping/instagram_private_api
使用此库的解决方案
from instagram_private_api import Client, ClientCompatPatch
user_name = 'YOUR_USERNAME'
password = 'YOUR_PASSWORD'
username_to_scrape = 'USERNAME_TO_SCRAPE'
all_posts = []
api = Client(user_name, password)
posts = api.username_feed(username_to_scrape) #Gets the first 12 posts
# Extract the value *next_max_id* from the above response, this is needed to load the next 12 posts
next_max_id = posts["next_max_id"]
all_posts = all_posts + posts
#
next_page_posts = api.username_feed(track_username, max_id = next_max_id)
这只是一个帮助您入门的简单示例。
更新:保存&加载Cookie
#Saving cookies
cookies = api.cookie_jar.dump()
with open("cookies.pkl", "wb") as save_cookies:
save_cookies.write(cookies)
#Loading cookies
with open("cookies.pkl", "rb") as read_cookies:
cookies = read_cookies.read()
#Pass cookies to Client to resume session
api = Client(user_name, password, cookie = cookies)