Question

from bs4 import BeautifulSoup
import requests

page = requests.get("https://www.instagram.com/marcelo.codes/")
soup = BeautifulSoup(page.content, "html.parser")

profileName = soup.find('h2', class_="_7UhW9       fKFbl yUEEX   KV-D4              fDxYl     ")

followers = soup.find('span', class_="g47SY")

bio = soup.find('div', class_="-vDIg")

postsAmount = soup.find('span', class_ ="g47SY lOXF2")

print(f"""  
Name: {profileName}
followers: {followers}
bio: {bio}
posts: {postsAmount}
""")

这是我的代码，每次运行的结果是：

python3 er.py 
  
Name: None
followers: None
bio: None
posts: None

我应该改变什么才能得到我想要的结果？

Answer 1

页面将数据存储在页面内的 Javascript 变量中。您可以使用此脚本从中获取日期：

import re
import json
import requests


headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0"
}

url = "https://www.instagram.com/marcelo.codes/"
data = json.loads(
    re.search(
        r"<script type=\"text/javascript\">window\._sharedData = (.*});",
        requests.get(url, headers=headers).text,
    ).group(1)
)

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

print("Bio:")
print(data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["biography"])

print("\nFollowed:")
print(
    data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["edge_followed_by"][
        "count"
    ]
)

print("\nFollowers:")
print(
    data["entry_data"]["ProfilePage"][0]["graphql"]["user"]["edge_follow"][
        "count"
    ]
)

打印：

Bio:
✏ Trying my best and showing my journey into coding.
?? Brazilian.
? Learning Python right now! 
? Taking doubts, and showing my progress.
??  Links.

Followed:
102

Followers:
10

无法抓取 Instagram 个人资料

1 个答案: