Question

我拥有的内容：xml文件中带有标记<xliff:g>的行，如：

<string name="AAAAAAA" msgid="XXXXXXX">"Activity <xliff:g id="BBBBBBB">%1$s</xliff:g> isn\'t responding."\n\n"Do you want to close it?"</string>

我需要的是：读取整个字符串：

Activity %1$s isn't responding.\n\nDo you want to close it?

请你帮忙吗？

我尝试使用xml.dom.minidom。

dom = xml.dom.minidom.parse(xmlfile)
strings = dom.getElementsByTagName('string')
for string in strings:
    rText = string.childNodes[0].nodeValue
    print(rText)

结果是“活动

Answer 1

您可以使用像BeautifulSoup这样的XML解析器，它非常易于使用（在我看来）：

>>> myxml = "thexmlyouposted"
>>> from bs4 import BeautifulSoup as BS
>>> soup = BS(myxml, 'xml')
>>> print soup.find('string').text
"Activity %1$s isn't responding."

"Do you want to close it?"

Answer 2

我将假设该元素是更大文件的一部分。例如：

<strings xmlns:xliff="some-name-space">
  <string name="AAAAAAA" msgid="XXXXXXX">"Activity <xliff:g id="BBBBBBB">%1$s</xliff:g> isn\'t responding."\n\n"Do you want to close it?"</string>
  <string name="AAAAAAA" msgid="XXXXXXX">"Another <xliff:g id="BBBBBBB">%1$s</xliff:g>message</string>
</strings>

使用minidom与任何其他框架一样好。打开文件并遍历所有元素。对于每个元素，调用函数get_text。获取下面定义的文本递归返回所有元素的内容（nodeValue）。

import xml.dom.minidom as md
dom = md.parse('wu.xml')
strings = dom.getElementsByTagName('string')
for string in strings:
    print get_text(string)

def get_text(el):
    """get_text
    For text nodes, returns the text. For element nodes, recursively call the
    function to aggregate all the text nodes into a string"""           
    msg = ''
    for n in el.childNodes:
        if n.nodeType == n.TEXT_NODE:
            msg += n.nodeValue
        elif n.nodeType == n.ELEMENT_NODE:
            msg += get_text(n)
    return msg

还有很多其他方法可以做到。

如何通过python3 </xliff：g>从xml文件中读取标记为<xliff：g>的字符串

2 个答案: