Question

我使用美丽的汤来刮取试图获得某些运动员身高的页面：

req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
height = soup.find_all("strong")
height = height[2].contents
print height

不幸的是，这是返回的内容：

[U＆＃39; 6 \＆＃39; 0＆＃34;＆＃39;]

我也尝试过：

height = str(height[2].contents)

和

height = unicode(height[2].contents)

但我仍然得到了[u＆＃39; 0＆＃39; 0＆＃34;＆＃39;]。

我怎么能有6＆＃39; 0＆＃34;没有额外的字符返回？谢谢你的帮助！

Answer 1

那些不是＆＃34;＆＃34;额外字符＆＃34;。 .contents returns a list，您选择的元素只有一个孩子，因此您获得的列表包含一个元素。 Python将列表打印为伪Python代码，因此您可以看到它是什么以及它是什么。

也许你想要.string？

Answer 2

如果您只想要第三个 strong 标记，则无需查找所有人，只需拥有您需要调用的元素{c}选择器nth-of-type即可{1}}：

.text

您还应该调用req = requests.get(url) soup = BeautifulSoup(req.content, "html.parser") height = soup.select_one("strong:nth-of-type(3)").text print(height)，让请求处理编码。

返回不需要的字符的美丽的汤

2 个答案: