Question

我想登录Facebook Messenger并解析HTML。

import requests
from bs4 import BeautifulSoup
import webbrowser
page = requests.get("https://www.messenger.com", auth=
('username', 'password'))

soup = BeautifulSoup(page, 'html.parser')

print(soup)

我从另一个堆栈问题中得到了这个信息，但它抛出了这个错误：

    File "C:/Code/Beautiful Soup Web Scraping.py", line 7, in <module>
    soup = len(BeautifulSoup(page, 'html.parser'))
  File "C:\Users\Ethan\AppData\Local\Programs\Python\Python37\lib\site-packages\bs4\__init__.py", line 246, in __init__
    elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()

如何使它正常工作？

Answer 1

您必须将网页的内容而不是Response返回的requests.get对象传递给BeautifulSoup。要获取内容，请使用Response.content属性。

在您的示例中，使用：soup = BeautifulSoup(page.content, 'html.parser')

Answer 2

我建议您使用Selenium，它将允许您登录Facebook，导航到所需页面并检索html。然后，您可以将HTML传递到BeautifulSoup。看看这个blog post，开始吧。

如何使用Python 3登录网站并进行抓取

2 个答案: