如何在HTML标签中获取文本?

时间:2017-11-09 10:17:34

标签: python beautifulsoup

<div class=" col-md-8">
  <strong>3.</strong>&nbsp;&nbsp;&nbsp;&nbsp;For 
  <i>ax</i>
  <sup>2</sup> + <i>bx</i> + <i>c</i> = 0, 
  which of the following statement is wrong?
</div>
<div class="row">
  <div class=" col-md-6">
     (a) three zeros
  </div>
  <div class=" col-md-6">
     (b) one zero
  </div>
  <div class=" col-md-6">
     (c) two zeros
  </div>
  <div class=" col-md-6">
     (d) none of these
  </div>
</div>

上面的代码重复了每个问题和答案。我使用BeautifulSoup检索数据但没有成功。

任何人都可以帮我使用BeautifulSoup检索数据(只有文本和没有html标签)吗?

1 个答案:

答案 0 :(得分:1)

**请注意,我编辑了标记以包含您指定的内容**

我刚刚编译了一些代码,我可以确认这会输出正确的字符串。请参阅以下代码:

from bs4 import BeautifulSoup

string = """<div class=" col-md-8">
<strong></strong>Every quadratic polynomial can have at most 
</div>
<div class="row">
<div class=" col-md-6">
(a) three zeros
</div>
<div class=" col-md-6">
(b) one zero
</div>
<div class=" col-md-6">
(c) two zeros
</div>
<div class=" col-md-6">
(d) none of these
</div>
</div>"""

soup = BeautifulSoup(string, "html.parser")
text = soup.get_text().replace("\n", "")

print(text)

这将输出

Every quadratic polynomial can have at most (a) three zeros(b) one zero(c) two zeros(d) none of these

我不确定您想要的完全格式,因此必须自行调整。