Question

我正在尝试在XML文档中搜索字符串，然后打印出包含该字符串的整个元素或元素。到目前为止，这是我的代码：

post = open('postf.txt', 'r')
postf = str(post.read())

root = etree.fromstring(postf)

e = root.xpath('//article[contains(text(), "stuff")]')

print etree.tostring(e, pretty_print=True)

这是从postf.txt搜索的XML

<stuff>

<article date="2014-05-18 17:14:44" title="Some stuff">More testing
debug
[done]
<tags>Hello and stuff
</tags></article>

</stuff>

最后，这是我的错误：

  File "cliassis-1.2.py", line 107, in command
    print etree.tostring(e, pretty_print=True)
  File "lxml.etree.pyx", line 3165, in lxml.etree.tostring (src\lxml\lxml.etree.c:69414)
TypeError: Type 'list' cannot be serialized.

我想要这样做，是搜索包含我搜索过的字符串的所有元素，然后打印出标签。所以，如果我有测试和东西，并且我搜索'test'，我希望它打印出来“test and stuff

Answer 1

articles = root.xpath('//article[contains(text(), "stuff")]')

for article in articles:
    print etree.tostring(article, pretty_print=True)

root.xpath返回一个Python列表。所以e是一个列表。 etree.tostring将lxml _Elements转换为字符串;它不会将_Elements的列表转换为字符串。因此，使用for-loop将列表中的_Elements打印为字符串。

Answer 2

你也可以使用这样的内置连接功能。

coord_list[[1]] <- coords(roc_train, x = "all")

Answer 3

这是可执行文件和工作解决方案，它也使用join（但正确） - 使用列表理解：

from lxml import etree

root = etree.fromstring('''<stuff>

<article date="2014-05-18 17:14:44" title="Some stuff">stuff in text
<tags>Hello and stuff</tags>
</article>

<article date="whatever" title="Some stuff">no s_t_u_f_f in text
<tags>Hello and stuff</tags>
</article>

<article date="whatever" title="whatever">More stuff in text
<tags>Hello and stuff</tags>
</article>

</stuff>''')
articles = root.xpath('//article[contains(text(), "stuff")]')

print("".join([etree.tostring(article, encoding="unicode", pretty_print=True) for article in articles]))

（对于encoding =“unicode”，请参阅例如http://makble.com/python-why-lxml-etree-tostring-method-returns-bytes）

使用带有lxml etree的Xpath时，List无法序列化错误

3 个答案: