使用带有lxml etree的Xpath时,List无法序列化错误

时间:2014-05-18 22:52:59

标签: python-2.7 xpath lxml

我正在尝试在XML文档中搜索字符串,然后打印出包含该字符串的整个元素或元素。到目前为止,这是我的代码:

post = open('postf.txt', 'r')
postf = str(post.read())

root = etree.fromstring(postf)

e = root.xpath('//article[contains(text(), "stuff")]')

print etree.tostring(e, pretty_print=True)

这是从postf.txt搜索的XML

<stuff>

<article date="2014-05-18 17:14:44" title="Some stuff">More testing
debug
[done]
<tags>Hello and stuff
</tags></article>

</stuff>

最后,这是我的错误:

  File "cliassis-1.2.py", line 107, in command
    print etree.tostring(e, pretty_print=True)
  File "lxml.etree.pyx", line 3165, in lxml.etree.tostring (src\lxml\lxml.etree.c:69414)
TypeError: Type 'list' cannot be serialized.

我想要这样做,是搜索包含我搜索过的字符串的所有元素,然后打印出标签。所以,如果我有测试和东西,并且我搜索'test',我希望它打印出来“test and stuff

3 个答案:

答案 0 :(得分:3)

articles = root.xpath('//article[contains(text(), "stuff")]')

for article in articles:
    print etree.tostring(article, pretty_print=True)

root.xpath返回一个Python列表。所以e是一个列表。 etree.tostring将lxml _Elements转换为字符串;它不会将_Elements的列表转换为字符串。因此,使用for-loop将列表中的_Elements打印为字符串。

答案 1 :(得分:2)

你也可以使用这样的内置连接功能。

coord_list[[1]] <- coords(roc_train, x = "all")

答案 2 :(得分:1)

这是可执行文件工作解决方案,它也使用join(但正确) - 使用列表理解:

from lxml import etree

root = etree.fromstring('''<stuff>

<article date="2014-05-18 17:14:44" title="Some stuff">stuff in text
<tags>Hello and stuff</tags>
</article>

<article date="whatever" title="Some stuff">no s_t_u_f_f in text
<tags>Hello and stuff</tags>
</article>

<article date="whatever" title="whatever">More stuff in text
<tags>Hello and stuff</tags>
</article>

</stuff>''')
articles = root.xpath('//article[contains(text(), "stuff")]')

print("".join([etree.tostring(article, encoding="unicode", pretty_print=True) for article in articles]))

(对于encoding =“unicode”,请参阅例如http://makble.com/python-why-lxml-etree-tostring-method-returns-bytes