Question

我正在尝试使用 beautifulsoup 在 rottentomatoes 中的文章中抓取电影标题。但是，电影标题在每个电影页面的 href 链接之后。这是我想要得到的：

<a href="https://www.rottentomatoes.com/m/the_shape_of_water_2017/">The Shape of Water</a> 我只想得到文本“水的形状”我可以得到这个文本，但这仅适用于一部电影。我想对同一页面上的所有电影都这样做，每部电影，链接的最后一部分都会发生变化。有人能告诉我该怎么做吗，我是网络抓取的初学者？

Answer 1

html_doc = """<a href="https://www.rottentomatoes.com/m/the_shape_of_water_2017/">The Shape of Water</a>"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

for x in soup:
    print(x.text) # The Shape of Water

在 beautifulsoap 中，当您想获取标签的文本时，只需将 text 与标签一起使用

网页抓取获取 href 链接后的每个字符串

1 个答案: