如何通过python3 </xliff:g>从xml文件中读取标记为<xliff:g>的字符串

时间:2013-06-17 08:17:50

标签: python

我拥有的内容:xml文件中带有标记<xliff:g>的行,如:

<string name="AAAAAAA" msgid="XXXXXXX">"Activity <xliff:g id="BBBBBBB">%1$s</xliff:g> isn\'t responding."\n\n"Do you want to close it?"</string>

我需要的是:读取整个字符串:

Activity %1$s isn't responding.\n\nDo you want to close it?

请你帮忙吗?

我尝试使用xml.dom.minidom。

dom = xml.dom.minidom.parse(xmlfile)
strings = dom.getElementsByTagName('string')
for string in strings:
    rText = string.childNodes[0].nodeValue
    print(rText)

结果是“活动

2 个答案:

答案 0 :(得分:0)

您可以使用像BeautifulSoup这样的XML解析器,它非常易于使用(在我看来):

>>> myxml = "thexmlyouposted"
>>> from bs4 import BeautifulSoup as BS
>>> soup = BS(myxml, 'xml')
>>> print soup.find('string').text
"Activity %1$s isn't responding."

"Do you want to close it?"

答案 1 :(得分:0)

我将假设该元素是更大文件的一部分。例如:

<strings xmlns:xliff="some-name-space">
  <string name="AAAAAAA" msgid="XXXXXXX">"Activity <xliff:g id="BBBBBBB">%1$s</xliff:g> isn\'t responding."\n\n"Do you want to close it?"</string>
  <string name="AAAAAAA" msgid="XXXXXXX">"Another <xliff:g id="BBBBBBB">%1$s</xliff:g>message</string>
</strings>

使用minidom与任何其他框架一样好。打开文件并遍历所有元素。对于每个元素,调用函数get_text。获取下面定义的文本递归返回所有元素的内容(nodeValue)。

import xml.dom.minidom as md
dom = md.parse('wu.xml')
strings = dom.getElementsByTagName('string')
for string in strings:
    print get_text(string)

def get_text(el):
    """get_text
    For text nodes, returns the text. For element nodes, recursively call the
    function to aggregate all the text nodes into a string"""           
    msg = ''
    for n in el.childNodes:
        if n.nodeType == n.TEXT_NODE:
            msg += n.nodeValue
        elif n.nodeType == n.ELEMENT_NODE:
            msg += get_text(n)
    return msg

还有很多其他方法可以做到。