我一直试图解决这个问题,但我设法做到这一点的唯一方法就是使用复杂的while循环。
我想输入以下内容:
"<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>"
并输出:
"This is a test (to see if this works) and I really hope it does"
从本质上讲,我想删除所有内容&#34;&lt; &GT;&#34;以及它们之间的任何东西。我可以用一些命令做的最好的事情是:
"This is a test (<i> to see </i> this works) and I really hope it does"
然后我离开了这些讨厌的家伙:<i></i>
这是我的代码:
from bs4 import BeautifulSoup
text = "<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>"
soup = BeautifulSoup(text)
content = soup.find_all("td","ToEx")
content[0].renderContents()
答案 0 :(得分:2)
只需打印代码的.text
属性,即可为其提供文字
print(content[0].text)
输出:
This is a test ( to see this works) and I really hope it does
答案 1 :(得分:0)
我会使用get_text()
- 它是针对这种情况而设计的:
text = "<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>"
soup = BeautifulSoup(text)
print(soup.get_text())
这应该有效as per the documentation。
我之前从未见过.text
,而是在Beautiful Soup 4中使用.string
- 如果您想使用它:
text="<td colspan='2' class='ToEx'>This is a test (<i> to see </i> this works) and I really hope it does</td>"
soup = BeautifulSoup(text)
for string in soup.strings:
print(str(string),end="")
两者都会输出:
这是一个测试(看到这个工作),我真的希望它能
两者都同样有效,但get_text()
会更容易使用,特别是如果你想将文本保存到变量等。