我正试图从Goodreads中删除引号。我只需要引用,而不是作者姓名。
以下是HTML源代码。
<div class="quoteText">
“Don't cry because it's over, smile because it happened.”
<br> ―
<a class="authorOrTitle" href="/author/show/61105.Dr_Seuss">Dr. Seuss</a>
</div>
我在下面尝试过,但它附带了作者信息。
quotes = [quote.text.strip() for quote in soup.findAll('div', {'class':'quoteText'})]
我也尝试使用contents[0]
,但在多行引号的情况下失败了。见下文:
<div class="quoteText">
“You've gotta dance like there's nobody watching,
<br>
Love like you'll never be hurt,
<br>
Sing like there's nobody listening,
<br>
And live like it's heaven on earth.”
<br> ―
<a class="authorOrTitle" href="/author/show/1744830.William_W_Purkey">William W. Purkey</a>
</div>
答案 0 :(得分:1)
这是一个简单的问题,当你quote.text.strip()
获得'“Don't cry because it's over, smile because it happened.”\n ―\n Dr. Seuss'
时,你可以用\n
拆分字符串并仅获得引用。
例:
[quote.text.strip().split("\n")[0] for quote in soup.findAll("div", {"class":"quoteText"})]
如果您不想要引号(例如“和”),可以使用""
.replace()