当元素包含smth时解析xml文件。 python特别

时间:2018-11-14 11:41:51

标签: python xml

我想解析XML文件并将某些部分写入csv文件。我会用python来做。我是编程和XML的新手。我读了很多书,但找不到解决这个问题的有用例子。

我的XML文件如下:

<Host name="1.1.1.1">
   <Properties>
      <tag name="id">1</tag>
      <tag name="os">windows</tag>
      <tag name="ip">1.11.111.1</tag>
   </Properties>
   <Report id="123">
      <output>
         Host is configured to get updates from another server.

         Update status:
            last detected: 2015-12-02 18:48:28
            last downloaded: 2015-11-17 12:34:22
            last installed: 2015-11-23 01:05:32

         Automatic settings:.....
       </output>
    </Report>
    <Report id="123">
       <output>
          Host is configured to get updates from another server.

          Environment Options:

          Automatic settings:.....
       </output>
    </Report>
</Host>

我的XML文件包含500个条目!我只想解析输出包含更新状态的XML块,因为我想将3个日期(最后检测到,最后下载并最后安装在CSV文件中)写入。我还要添加ID,操作系统和IP。

我在ElementTree库中尝试过,但是我无法过滤其中输出包含更新状态的element.text。目前,我能够从整个文件中提取所有文本和属性,但无法过滤其中输出包含更新状态,最后检测到,最后下载或最后安装的块。

任何人都可以提出一些建议以实现这一目标吗?

所需的输出:

id:1
os:windows 
ip:1.11.111.1 
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22 
last installed:2015-11-23 01:05:32 

所有这些信息都以.csv文件格式编写

此刻我的代码如下:

#!/usr/bin/env python
import xml.etree.ElementTree as ET
import csv

tree = ET.parse("file.xml")
root = tree.getroot()

# open csv file for writing
data = open('test.csv', 'w')

# create csv writer object
csvwriter = csv.writer(data)

# filter xml file
for tag in root.findall(".Host/Properties/tag[@name='ip']"):print(tag.text) # gives all ip's from whole xml 
for output in root.iter('output'):print(plugin.text) # gives all outputs from whole xml
data.close()

最诚挚的问候

1 个答案:

答案 0 :(得分:0)

当您从<Host>元素开始并逐步下降时,这相对简单。

迭代所有节点,但仅在子串"Update status:"出现在<output>的值中时输出:

for host in tree.iter("Host"):
    host_id = host.find('./Properties/tag[@name="id"]')
    host_os = host.find('./Properties/tag[@name="os"]')
    host_ip = host.find('./Properties/tag[@name="ip"]')

    for output in host.iter("output"):
        if output.text is not None and "Update status:" in output.text:
            print("id:" + host_id.text)
            print("os:" + host_os.text)
            print("ip:" + host_ip.text)

            for line in output.text.splitlines():
                if ("last detected:" in line or
                    "last downloaded" in line or
                    "last installed"  in line):
                    print(line.strip())

将其输出为您的示例XML:

id:1
os:windows
ip:1.11.111.1
last detected: 2015-12-02 18:48:28
last downloaded: 2015-11-17 12:34:22
last installed: 2015-11-23 01:05:32

次要点:并不是真正的CSV,因此按原样将其写入* .csv文件并不是很干净。