Question

相对较新的BeautifulSoup。试图从本地保存的html文件中获取原始html。我环顾四周，发现我应该为此使用美丽的汤。虽然我这样做了：

from bs4 import BeautifulSoup
url = r"C:\example.html"
soup = BeautifulSoup(url, "html.parser")
text = soup.get_text()
print (text)

打印出一个空字符串。我假设我错过了一些步骤。任何朝着正确方向的推动都会非常感激。

Answer 1

BeautifulSoup的第一个参数是实际的HTML字符串，而不是URL。打开文件，阅读其内容，然后传入。

Answer 2

触摸上一个答案，有两种方法可以打开HTML文件：

1。

with open("example.html") as fp:
    soup = BeautifulSoup(fp)

2。

soup = BeautifulSoup(open("example.html"))