Question

我正在尝试从HTML的注释会话中解析特定条目。我正在尝试使用BeautifulSoup来做到这一点。在提取评论部分后，我陷入了困境。这是一个例子： example

import urllib
import sys
from bs4 import BeautifulSoup, Comment

soup = BeautifulSoup(open("test.html"), 'html.parser')
comments = soup.findAll(text=lambda text:isinstance(text, Comment))

你们知道我如何获得正确的信息吗？欣赏它！

Answer 1

我还没有看到html的所有文件内容，但是由于所有注释的格式相同，因此您可以将它们直接解析为字符串，如下所示：

# parse the comments
for comment in comments:
    for line in comment.splitlines():
        if ':' in line:
            attrib, val = line.strip().rsplit(':',1)
            print '{} -> {}'.format(attrib, val.strip(','))

使用python BeautifulSoup查找HTML代码中的特定注释条目

1 个答案: