我正在尝试在XML文档中搜索字符串,然后打印出包含该字符串的整个元素或元素。到目前为止,这是我的代码:
post = open('postf.txt', 'r')
postf = str(post.read())
root = etree.fromstring(postf)
e = root.xpath('//article[contains(text(), "stuff")]')
print etree.tostring(e, pretty_print=True)
这是从postf.txt搜索的XML
<stuff>
<article date="2014-05-18 17:14:44" title="Some stuff">More testing
debug
[done]
<tags>Hello and stuff
</tags></article>
</stuff>
最后,这是我的错误:
File "cliassis-1.2.py", line 107, in command
print etree.tostring(e, pretty_print=True)
File "lxml.etree.pyx", line 3165, in lxml.etree.tostring (src\lxml\lxml.etree.c:69414)
TypeError: Type 'list' cannot be serialized.
我想要这样做,是搜索包含我搜索过的字符串的所有元素,然后打印出标签。所以,如果我有测试和东西,并且我搜索'test',我希望它打印出来“test and stuff
答案 0 :(得分:3)
articles = root.xpath('//article[contains(text(), "stuff")]')
for article in articles:
print etree.tostring(article, pretty_print=True)
root.xpath
返回一个Python列表。所以e
是一个列表。 etree.tostring
将lxml _Elements
转换为字符串;它不会将_Elements
的列表转换为字符串。因此,使用for-loop
将列表中的_Elements
打印为字符串。
答案 1 :(得分:2)
你也可以使用这样的内置连接功能。
coord_list[[1]] <- coords(roc_train, x = "all")
答案 2 :(得分:1)
这是可执行文件和工作解决方案,它也使用join
(但正确) - 使用列表理解:
from lxml import etree
root = etree.fromstring('''<stuff>
<article date="2014-05-18 17:14:44" title="Some stuff">stuff in text
<tags>Hello and stuff</tags>
</article>
<article date="whatever" title="Some stuff">no s_t_u_f_f in text
<tags>Hello and stuff</tags>
</article>
<article date="whatever" title="whatever">More stuff in text
<tags>Hello and stuff</tags>
</article>
</stuff>''')
articles = root.xpath('//article[contains(text(), "stuff")]')
print("".join([etree.tostring(article, encoding="unicode", pretty_print=True) for article in articles]))
(对于encoding =“unicode”,请参阅例如http://makble.com/python-why-lxml-etree-tostring-method-returns-bytes)