从HTML标签中删除评论

时间:2016-07-03 08:28:32

标签: html python-2.7 tags beautifulsoup bs4

How can I strip comment tags from HTML using BeautifulSoup?的评论,我正在尝试从以下标记中删除评论

>>> h
<h4 class="col-sm-4"><!-- react-text: 124 -->52 Week High/Low:<!-- /react-text --><b><!-- react-text: 126 --> ₹ <!-- /react-text --><!-- react-text: 127 -->394.00<!-- /react-text --><!-- react-text: 128 --> / ₹ <!-- /react-text --><!-- react-text: 129 -->252.10<!-- /react-text --></b></h4>

我的代码 -

comments = h.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
print h

但搜索评论却一无所获。我想从上面的标签中提取2个值 - “52周高/低:”“₹394.00 /₹252.10”

我也尝试使用

从整个html中删除标签
soup = BeautifulSoup(html)
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
print soup

但是评论仍在那里..有什么建议吗?

1 个答案:

答案 0 :(得分:1)

您使用的是public class AdminMenu : INavigationProvider { public Localizer T { get; set; } public string MenuName { get { return "admin"; } } public void GetNavigation(NavigationBuilder builder) { builder .Add(T("Your Content Type Display Name"), "1", menu => menu .Action("List", "Admin", new { area = "Contents", id = "YourContentTypeName" })); } } Python2.7吗?如果不是后者,我会安装BeautifulSoup4

BeautifulSoup4

以下脚本适合我。我刚从上面的问题中复制并粘贴并运行它。

pip install beautifulsoup4
  

注意:您发布from bs4 import BeautifulSoup, Comment html = """<h4 class="col-sm-4"><!-- react-text: 124 -->52 Week High/Low:<!-- /react-text --><b><!-- react-text: 126 --> ₹ <!-- /react-text --><!-- react-text: 127 -->394.00<!-- /react-text --><!-- react-text: 128 --> / ₹ <!-- /react-text --><!-- react-text: 129 -->252.10<!-- /react-text --></b></h4>""" soup = BeautifulSoup(html) comments = soup.findAll(text=lambda text:isinstance(text, Comment)) # nit: It isn't good practice to use a list comprehension only for its # side-effects. (Wastes space constructing an unused list) for comment in comments: comment.extract() print soup 声明是件好事。不会知道它是Python 2。发布Python版本也有帮助。