Question

我正在使用lxml解析一个带有facebook评论标签的html：

<fb:comments id="fb_comments"  href="http://example.com" num_posts="5" width="600"></fb:comments>

我正在尝试选择它以获取href值，但是当我执行cssselect('fb:comments')时，我收到以下错误：

The pseudo-class Symbol(u'comments', 3) is unknown

有办法吗？

修改代码：

from lxml.html import fromstring
html = '...'
parser = fromstring(html)
parser.cssselect('fb:comments')  #raises the exception

Answer 1

cssselect()方法使用给定的CSS selector表达式解析文档。在您的情况下，冒号字符（:）是XML命名空间前缀分隔符（即<namespace:tagname/>），它与CSS伪类语法（即tagname:pseudo-class）混淆。

根据lxml manual，您应该在namespace-prefix|element中使用cssselect()语法，以便找到带有命名空间前缀（comments）的标记（fb）。所以：

from lxml.html import fromstring
html = '...'
parser = fromstring(html)
parser.cssselect('fb|comments')