Question

我正试图从Goodreads中删除引号。我只需要引用，而不是作者姓名。

以下是HTML源代码。

<div class="quoteText">
      “Don't cry because it's over, smile because it happened.”
  <br>  ―
    <a class="authorOrTitle" href="/author/show/61105.Dr_Seuss">Dr. Seuss</a>
</div>

我在下面尝试过，但它附带了作者信息。

quotes = [quote.text.strip() for quote in soup.findAll('div', {'class':'quoteText'})]

我也尝试使用contents[0]，但在多行引号的情况下失败了。见下文：

<div class="quoteText">
      “You've gotta dance like there's nobody watching,
<br>
Love like you'll never be hurt,
<br>
Sing like there's nobody listening,
<br>
And live like it's heaven on earth.”
  <br>  ―
    <a class="authorOrTitle" href="/author/show/1744830.William_W_Purkey">William W. Purkey</a>
</div>

Answer 1

这是一个简单的问题，当你quote.text.strip()获得'“Don't cry because it's over, smile because it happened.”\n ―\n Dr. Seuss'时，你可以用\n拆分字符串并仅获得引用。例： [quote.text.strip().split("\n")[0] for quote in soup.findAll("div", {"class":"quoteText"})]

如果您不想要引号（例如“和”），可以使用""

将其替换为.replace()

从节点中删除除最后一个孩子的文本之外的文本

1 个答案: