XML解析CDATA元素

时间:2011-03-01 11:11:23

标签: python xml cdata

我想以下列格式解析包含CDATA元素的xml

<showtimes><![CDATA[6:50 PM,https://www.movietickets.com/purchase.asp?afid=rgncom&house_id=6446&language=2&movie_id=87050&perft=18:50&perfd=03012011,9:40 PM,https://www.movietickets.com/purchase.asp?afid=rgncom&house_id=6446&language=2&movie_id=87050&perft=21:40&perfd=03012011]]> </showtimes>

请帮我找一个解决方案。

3 个答案:

答案 0 :(得分:4)

这不应该是任何问题 - 例如与lxml:

from lxml import etree

input = '<showtimes><![CDATA[6:50 PM,https://www.movietickets.com/purchase.asp?afid=rgncom&house_id=6446&language=2&movie_id=87050&perft=18:50&perfd=03012011,9:40 PM,https://www.movietickets.com/purchase.asp?afid=rgncom&house_id=6446&language=2&movie_id=87050&perft=21:40&perfd=03012011]]> </showtimes>'

f = etree.fromstring(input)
for s in f.xpath("//showtimes"):
    print s.text

...打印:

  

下午6:50,https://www.movi​​etickets.com/purchase.asp?afid = rgncom&amp; house_id = 6446&amp; language = 2&amp; movie_id = 87050&amp; perft = 18:50&amp; perfd = 03012011,9:下午40点,https://www.movi​​etickets.com/purchase.asp?afid = rgncom&amp; house_id = 6446&amp; language = 2&amp; movie_id = 87050&amp; perft = 21:40&amp; perfd = 03012011

答案 1 :(得分:1)

我不确定你在找什么。这是基于一些疯狂假设的答案。

PS:此解决方案需要lxml

>>> s = """<showtimes><![CDATA[6:50 PM,https://www.movietickets.com/purchase.asp?afid=rgncom&house_id=6446&language=2&movie_id=87050&perft=18:50&perfd=03012011,9:40 PM,https://www.movietickets.com/purchase.asp?afid=rgncom&house_id=6446&language=2&movie_id=87050&perft=21:40&perfd=03012011]]> </showtimes>"""
>>> from lxml import etree
>>> import urlparse
>>> doc = etree.fromstring(s)
>>> _time, url = doc.text.split(',', 1)
>>> _time # Not sure if you want this
'6:50 PM'
>>> for key, value in urlparse.parse_qs(urlparse.urlsplit(url).query).items():
    print key, value


perfd ['03012011,9:40 PM,https://www.movietickets.com/purchase.asp?afid=rgncom', '03012011 ']
movie_id ['87050', '87050']
language ['2', '2']
perft ['18:50', '21:40']
afid ['rgncom']
house_id ['6446', '6446']
>>> 

答案 2 :(得分:0)

据我所知,standard python SAX解析器正确处理CDATA。你将能够解析它。