任务:从Instagram帐户获取帖子文本。 Instagram页面是动态生成的,所以我只能得到前几篇文章。我知道我可以使用Selenium滚动页面,但是我不明白如何在此代码中包含Selenium。还有其他方法吗?
from bs4 import BeautifulSoup as soup
import requests
import json
import re
url = 'some url'
page = requests.get(url)
html = soup( page.text, 'html.parser')
s = str(html)
r = re.compile('"shortcode":"(.*?)"')
result = r.findall(s)
print(result)
for shortlink in result:
print ("https://api.instagram.com/oembed?url=http://instagr.am/p/"+shortlink)
response = requests.get("https://api.instagram.com/oembed?url=http://instagr.am/p/"+shortlink)
todos = json.loads(response.text)
print(todos['title'])
print("---------")