无法抓取 Instagram 个人资料

时间:2021-03-23 12:52:32

标签: python web-scraping beautifulsoup python-requests instagram

from bs4 import BeautifulSoup
import requests

page = requests.get("https://www.instagram.com/marcelo.codes/")
soup = BeautifulSoup(page.content, "html.parser")

profileName = soup.find('h2', class_="_7UhW9       fKFbl yUEEX   KV-D4              fDxYl     ")

followers = soup.find('span', class_="g47SY")

bio = soup.find('div', class_="-vDIg")

postsAmount = soup.find('span', class_ ="g47SY lOXF2")

print(f"""  
Name: {profileName}
followers: {followers}
bio: {bio}
posts: {postsAmount}
""")

这是我的代码,每次运行的结果是:

python3 er.py 
  
Name: None
followers: None
bio: None
posts: None

我应该改变什么才能得到我想要的结果?

1 个答案:

答案 0 :(得分:0)

页面将数据存储在页面内的 Javascript 变量中。您可以使用此脚本从中获取日期:

import re
import json
import requests


headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
}

url = "https://www.instagram.com/marcelo.codes/"
data = json.loads(
    re.search(
        r"<script type=\"text/javascript\">window\._sharedData = (.*});",
        requests.get(url, headers=headers).text,
    ).group(1)
)

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

print("Bio:")
print(data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["biography"])

print("\nFollowed:")
print(
    data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["edge_followed_by"][
        "count"
    ]
)

print("\nFollowers:")
print(
    data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["edge_follow"][
        "count"
    ]
)

打印:

Bio:
✏ Trying my best and showing my journey into coding.
?? Brazilian.
? Learning Python right now! 
? Taking doubts, and showing my progress.
??  Links.

Followed:
102

Followers:
10