我有以下XML文件:
<tv>
<programme channel="BBC Red Button 1" start="20180422123000 +0000" stop="20180422125500 +0000">
<title lang="en">Live Snooker: The World Championship: Day Two - 2018</title>
<desc lang="en">Coverage of day two at the Crucible Theatre in Sheffield</desc>
<category lang="en">Sport</category>
<icon src="http://images.radiotimes.com/remote/static.radiotimes.com.edgesuite.net/pa/70/26/webANXsnookerlivebbc.jpg?quality=60&mode=crop&width=130&height=100&404=tv" />
</programme>
<programme channel="BBC Red Button 1" start="20180422125500 +0000" stop="20180422150000 +0000">
<title lang="en">Live UEFA Women's Champions League</title>
<desc lang="en">Manchester City v Lyon (Kick-off 1.00pm)</desc>
<category lang="en">Sport</category>
<icon src="http://images.radiotimes.com/assets/images/holding/tv.png?quality=60&mode=crop&width=130&height=100&404=tv" />
</programme>
</tv>
首先我要删除src等于
的元素图标<icon src="http://images.radiotimes.com/assets/images/holding/tv.png?quality=60&mode=crop&width=130&height=100&404=tv" />
然后对于剩余的图标,我正在尝试将quality=60&mode=crop&width=130&height=100
替换为quality=100&mode=crop&width=1200&height=723
因此,一旦XML文件被分阶段,它将如下所示:
<tv>
<programme channel="BBC Red Button 1" start="20180422123000 +0000" stop="20180422125500 +0000">
<title lang="en">Live Snooker: The World Championship: Day Two - 2018</title>
<desc lang="en">Coverage of day two at the Crucible Theatre in Sheffield</desc>
<category lang="en">Sport</category>
<icon src="http://images.radiotimes.com/remote/static.radiotimes.com.edgesuite.net/pa/70/26/webANXsnookerlivebbc.jpg?quality=100&mode=crop&width=1200&height=723&404=tv" />
</programme>
<programme channel="BBC Red Button 1" start="20180422125500 +0000" stop="20180422150000 +0000">
<title lang="en">Live UEFA Women's Champions League</title>
<desc lang="en">Manchester City v Lyon (Kick-off 1.00pm)</desc>
<category lang="en">Sport</category>
</programme>
</tv>
我首先需要在替换其他值之前删除我不想要的XML文件中的图标,所以我最终不会更改我要删除的图标的值,到目前为止我已经尝试过了以下删除图标,但我没有成功:
#!/bin/sh
from xml.etree.ElementTree import ElementTree
t = ElementTree()
t.parse('/volume1/TVMosaic/Freeview-WG++/guide.xml')
programmeList = t.findall('tv/programme/icon')
for programmeEl in programmeList:
if programmeEl.attrib['src'] in ('http://images.radiotimes.com/assets/images/holding/tv.png?quality=60&mode=crop&width=130&height=100&404=tv') and \
programmeEl.attrib['src'] == programmeEl.text:
del programmeEl.attrib['src']
t.write('/volume1/TVMosaic/Freeview-WG++/PhasedGuide.xml')
有人能帮我删除那些我提到的那个src的图标,然后用我之前提到的值替换其余图标中的值。
谢谢。
答案 0 :(得分:0)
问题在于,您正在寻找的字符串是 XML 转义 (请注意&#34; &amp; amp ; &#34; s),在解析文件时,字符串未转义(&amp; amp; 转换为 &amp; - 还有其他一些)。有关详细信息,请查看[Python.Wiki]: Escaping XML。
code.py :
#!/usr/bin/env python3
import sys
from xml.etree import ElementTree as ET
from xml.sax.saxutils import escape, unescape
INPUT_FILE_NAME = "guide.xml"
OUTPUT_FILE_NAME = "PhasedGuide.xml"
SRC_ATTR_TEXT = "http://images.radiotimes.com/assets/images/holding/tv.png?quality=60&mode=crop&width=130&height=100&404=tv"
SRC_ATTR_REPLACE_TEXT = "quality=60&mode=crop&width=130&height=100"
SRC_ATTR_REPLACE_WITH_TEXT = "quality=100&mode=crop&width=1200&height=723"
def main():
tree = ET.parse(INPUT_FILE_NAME)
tv_node = tree.getroot()
for programme_node in tv_node.findall("programme"):
icon_node = programme_node.find("icon")
if icon_node is not None:
print(icon_node.get("src", ""))
src_attr = escape(icon_node.get("src", ""))
if src_attr == SRC_ATTR_TEXT:
programme_node.remove(icon_node)
elif src_attr:
icon_node.set("src", unescape(src_attr.replace(SRC_ATTR_REPLACE_TEXT, SRC_ATTR_REPLACE_WITH_TEXT)))
tree.write(OUTPUT_FILE_NAME)
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
main()
备注:
<强>输出:
(py35x64_test) e:\Work\Dev\StackOverflow\q049967927>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32 (py35x64_test) e:\Work\Dev\StackOverflow\q049967927>type PhasedGuide.xml <tv> <programme channel="BBC Red Button 1" start="20180422123000 +0000" stop="20180422125500 +0000"> <title lang="en">Live Snooker: The World Championship: Day Two - 2018</title> <desc lang="en">Coverage of day two at the Crucible Theatre in Sheffield</desc> <category lang="en">Sport</category> <icon src="http://images.radiotimes.com/remote/static.radiotimes.com.edgesuite.net/pa/70/26/webANXsnookerlivebbc.jpg?quality=100&mode=crop&width=1200&height=723&404=tv" /> </programme> <programme channel="BBC Red Button 1" start="20180422125500 +0000" stop="20180422150000 +0000"> <title lang="en">Live UEFA Women's Champions League</title> <desc lang="en">Manchester City v Lyon (Kick-off 1.00pm)</desc> <category lang="en">Sport</category> </programme> </tv>