Question

我正在尝试从URL正文中抓取一个短语/作者。我可以抓取这些短语，但是我不知道如何找到作者并将其与该短语一起打印。你能帮我吗？

import urllib.request
from bs4 import BeautifulSoup

page_url = "https://www.pensador.com/frases/"
page = urllib.request.urlopen(page_url)
soup = BeautifulSoup(page, "html.parser")

for frase in soup.find_all("p", attrs={'class': 'frase fr'}):
    print(frase.text + '\n')

# author = soup.find_all("span", attrs={'class': 'autor'})
# print(author.text)
# this is the author that I need, for each phrase the right author

Answer 1

您可以转到parent标记p.frase.fr的{{1}}，然后选择div并沿span.autor降下来获得作者：

div

在这里，我通过In [1268]: for phrase in soup.select('p.frase.fr'): ...: author = phrase.parent.select_one('span.autor') ...: print(author.text.strip(), ': ', phrase.text.strip()) ...: Roberto Shinyashiki : Tudo o que um sonho precisa para ser realizado é alguém que acredite que ele possa ser realizado. Paulo Coelho : Imagine uma nova história para sua vida e acredite nela. Carlos Drummond de Andrade : Ser feliz sem motivo é a mais autêntica forma de felicidade. ... ...使用CSS选择器，您显然可以在这里使用phrase.parent.select_one('span.autor')：

find

通过Beautifulsoup寻找儿童内容

1 个答案: