Question

经过与我在Unable to print links using beautifulsoup while automating through selenium上的问题的讨论

我意识到主要的问题在于请求无法提取的URL。页面的URL实际上是https://society6.com/discover，但是我正在使用硒登录到我的帐户，因此URL变为https://society6.com/society?show=2

但是，由于显示错误，我无法将第二个URL与请求一起使用。我该如何从这样的URL中抓取信息。

Answer 1

您需要先登录！

为此，您可以使用bs4.BeautifulSoup库。

这是我使用的一种实现方式：

import requests
from bs4 import BeautifulSoup

BASE_URL = "https://society6.com/"


def log_in_and_get_session():
    """
    Get the session object with login details
    :return: requests.Session
    """    
    ss = requests.Session()
    ss.verify = False    # optinal for uncertifaied sites. 
    text = ss.get(f"{BASE_URL}login").text
    csrf_token = BeautifulSoup(text, "html.parser").input["value"]
    data = {"username": "your_username", "password": "your_password", "csrfmiddlewaretoken": csrf_token}
    # results = ss.post("{}login".format(BASE_URL), data=data)
    results = ss.post("{}login".format(BASE_URL), data=data)
    if results.ok:
        print("Login success", results.status_code)
        return ss
    else:
        print("Can't  login", results.status_code)

使用“ post”方法登录...

希望这对您有帮助！

修改

添加了函数的开头。

无法使用python请求访问网页

1 个答案: