Question

我想从新闻网站RSS Feed中提取内容，如下所示

<item>
<title>BPS: Kartu Bansos Bantu Turunkan Angka Gini Ratio</title>
<media:content url="/image.jpg" expression="full" type="image/jpeg"/>  </item>

但出现错误使用像 item.xpath（＆＃39; // media：content＆＃39;）

等xpath这样的标签解析信息时

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/parsel/selector.py", line 183, in xpath
    six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])
  File "/usr/local/lib/python2.7/site-packages/parsel/selector.py", line 179, in xpath
    smart_strings=self._lxml_smart_strings)
  File "src/lxml/lxml.etree.pyx", line 1587, in lxml.etree._Element.xpath (src/lxml/lxml.etree.c:57923)
  File "src/lxml/xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src/lxml/lxml.etree.c:167084)
  File "src/lxml/xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:166043)
ValueError: XPath error: Undefined namespace prefix in //media:content

有人知道我该怎么办？谢谢:)

Answer 1

您需要先通过调用选择器上的register_namespace(prefix, namespace)告诉xpath media前缀映射到哪个命名空间，例如：

selector.register_namespace('media', 'http://the.namespace.of/media')

或者如果您只想使用本地名称，则可以使用：

 item.xpath("//*[local-name()='content']")

Scrapy：XPath错误：// media：content中的表达式无效

1 个答案: