Question

我想用python解析HTML文件，但是BeautifulSoup省略了一些关键标签。

网站上HTML文件的一部分如下所示，其中包含所有子div。 HTML snippet

但是当使用beautifulsoup prettify函数时，它看起来像这样，没有任何子div。 HTML snippet from python

我使用的代码在这里：

from bs4 import BeautifulSoup
import urllib.request

#A random plus code, the %2B is  just a +

PLUS_CODE = "792F7C4F%2B54"
url = "https://www.plus.codes/" + PLUS_CODE

hdr = {"User-Agent" : "Mozilla/5.0"}
req = urllib.request.Request(url, headers=hdr)
r = urllib.request.urlopen(req)
r_tags = r.read().decode('utf-8')
soup = BeautifulSoup(r_tags, "lxml")

print(soup.prettify())

最终发生的事情是我无法到达children div并提取所需的文本。

Answer 1

在BeautifulSoup方法中尝试用'lxml'代替'html.parser'。也许那会解决问题。如果没有，请共享一些代码。

Beautiful Soup省略标签

1 个答案: