我拥有的内容:xml文件中带有标记<xliff:g>
的行,如:
<string name="AAAAAAA" msgid="XXXXXXX">"Activity <xliff:g id="BBBBBBB">%1$s</xliff:g> isn\'t responding."\n\n"Do you want to close it?"</string>
我需要的是:读取整个字符串:
Activity %1$s isn't responding.\n\nDo you want to close it?
请你帮忙吗?
我尝试使用xml.dom.minidom。
dom = xml.dom.minidom.parse(xmlfile)
strings = dom.getElementsByTagName('string')
for string in strings:
rText = string.childNodes[0].nodeValue
print(rText)
结果是“活动
答案 0 :(得分:0)
您可以使用像BeautifulSoup这样的XML解析器,它非常易于使用(在我看来):
>>> myxml = "thexmlyouposted"
>>> from bs4 import BeautifulSoup as BS
>>> soup = BS(myxml, 'xml')
>>> print soup.find('string').text
"Activity %1$s isn't responding."
"Do you want to close it?"
答案 1 :(得分:0)
我将假设该元素是更大文件的一部分。例如:
<strings xmlns:xliff="some-name-space">
<string name="AAAAAAA" msgid="XXXXXXX">"Activity <xliff:g id="BBBBBBB">%1$s</xliff:g> isn\'t responding."\n\n"Do you want to close it?"</string>
<string name="AAAAAAA" msgid="XXXXXXX">"Another <xliff:g id="BBBBBBB">%1$s</xliff:g>message</string>
</strings>
使用minidom与任何其他框架一样好。打开文件并遍历所有元素。对于每个元素,调用函数get_text
。获取下面定义的文本递归返回所有元素的内容(nodeValue)。
import xml.dom.minidom as md
dom = md.parse('wu.xml')
strings = dom.getElementsByTagName('string')
for string in strings:
print get_text(string)
def get_text(el):
"""get_text
For text nodes, returns the text. For element nodes, recursively call the
function to aggregate all the text nodes into a string"""
msg = ''
for n in el.childNodes:
if n.nodeType == n.TEXT_NODE:
msg += n.nodeValue
elif n.nodeType == n.ELEMENT_NODE:
msg += get_text(n)
return msg
还有很多其他方法可以做到。