在python中解析一个特殊的xml

时间:2012-11-09 10:38:28

标签: python xml parsing shell

我有一个特殊的xml文件,如下所示:

<alarm-dictionary source="DDD" type="ProxyComponent">

    <alarm code="402" severity="Alarm" name="DDM_Alarm_402">
    <message>Database memory usage low threshold crossed</message>
    <description>dnKinds = database
    type = quality_of_service
    perceived_severity = minor
    probable_cause = thresholdCrossed
    additional_text = Database memory usage low threshold crossed
    </description>
    </alarm>

        ...
</alarm-dictionary>

我知道在python中,我可以通过标记警告获取“警报代码”,“严重性”:

for alarm_tag in dom.getElementsByTagName('alarm'):
    if alarm_tag.hasAttribute('code'):
        alarmcode = str(alarm_tag.getAttribute('code'))

我可以在标记消息中获取文字,如下所示:

for messages_tag in dom.getElementsByTagName('message'):
    messages = ""
    for message_tag in messages_tag.childNodes:
        if message_tag.nodeType in (message_tag.TEXT_NODE, message_tag.CDATA_SECTION_NODE):
            messages += message_tag.data

但我也希望得到,例如 dnkind (数据库),类型(quality_of_service), perceived_severity (thresholdCrossed)和 probable_cause (数据库内存使用率低阈值越过 )在标签 description

也就是说,我也想在xml中解析标签中的内容。

有人可以帮我吗? 非常感谢!

3 个答案:

答案 0 :(得分:4)

description标签获得文本后,它与XML解析无关。你只需要做简单的字符串解析就可以将type = quality_of_service键/值字符串变成更好的东西,比如字典就可以在Python中使用

通过ElementTree进行一些稍微简单的解析,它看起来像这样

messages = """
<alarm-dictionary source="DDD" type="ProxyComponent">

    <alarm code="402" severity="Alarm" name="DDM_Alarm_402">
    <message>Database memory usage low threshold crossed</message>
    <description>dnKinds = database
    type = quality_of_service
    perceived_severity = minor
    probable_cause = thresholdCrossed
    additional_text = Database memory usage low threshold crossed
    </description>
    </alarm>

        ...
</alarm-dictionary>
"""

import xml.etree.ElementTree as ET

# Parse XML
tree = ET.fromstring(messages)

for alarm in tree.getchildren():
    # Get code and severity
    print alarm.get("code")
    print alarm.get("severity")

    # Grab description text
    descr = alarm.find("description").text

    # Parse "thing=other" into dict like {'thing': 'other'}
    info = {}
    for dl in descr.splitlines():
        if len(dl.strip()) > 0:
            key, _, value = dl.partition("=")
            info[key.strip()] = value.strip()
    print info

答案 1 :(得分:2)

我不太确定Python,但经过快速研究后。

看到你已经可以从XML中的description标签中获取所有内容,你是不是可以通过换行符拆分,然后使用等号上的str.split()函数拆分每一行来给你命名/价值分开?

e.g。

for messages_tag in dom.getElementsByTagName('message'):
messages = ""
for message_tag in messages_tag.childNodes:
    if message_tag.nodeType in (message_tag.TEXT_NODE, message_tag.CDATA_SECTION_NODE):
        messages += message_tag.data
tag =  str.split('=');
tagName = tag[0]
tagValue = tag[1]

(我没有考虑将每一行拆分并循环)

但这应该让你走上正轨:)

答案 2 :(得分:2)

AFAIK没有库可以将文本作为DOM元素处理。

但是,您可以(在message变量中有消息后)执行:

description = {}
messageParts = message.split("\n")
for part in messageParts:
    descInfo = part.split("=")
    description[descInfo[0].strip()] = descInfo[1].strip()

然后您将以description地图的形式在key-value内找到所需信息。

您还应该在我的代码上添加错误处理...