解析XML文件以获取文件夹结构?

时间:2019-04-27 10:51:33

标签: python xml xml-parsing

我必须从我的XML文件中检索文件夹结构。

我的文件夹结构:

enter image description here

XML文件(捕获上述文件夹结构)如下:

unsigned int Findpairs(const std::vector<std::vector<unsigned int>>& A) {
    unsigned int count = 0;

    for (unsigned int i = 0; i < (A.size()-1); i++) {
        for (unsigned int j = 0; j < (A[i].size()-1); j++){
            if(A[i][j]==1 && A[i+1][j]==1 && A[i+1][j+1]==1){
                count++;
            }
        }
    }

    return (count);
}

我的Python脚本:

<?xml version="1.0" encoding="utf-8"?>
<serverfiles name="Test">
  <serverfiles name="Fail">
    <serverfiles name="Cam1">
      <serverfiles name="Mod1">
        <serverfiles name="2019-03-07" />
        <serverfiles name="2019-03-08" />
      </serverfiles>
      <serverfiles name="Mod2">
        <serverfiles name="2019-03-07" />
        <serverfiles name="2019-03-08" />
      </serverfiles>
    </serverfiles>
  </serverfiles>
  <serverfiles name="Pass">
    <serverfiles name="Cam1">
      <serverfiles name="Mod1">
        <serverfiles name="2019-03-07" />
        <serverfiles name="2019-03-08" />
      </serverfiles>
      <serverfiles name="Mod2">
        <serverfiles name="2019-03-07" />
        <serverfiles name="2019-03-08" />
      </serverfiles>
    </serverfiles>
  </serverfiles>
</serverfiles>

上面的代码产生以下结果:

  

[{'name':'Test'},{'name':'Fail'},{'name':'Cam1'},{'name':   'Mod1'},{'name':'2019-03-07'},{'name':'2019-03-08'},{'name':   'Mod2'},{'name':'2019-03-07'},{'name':'2019-03-08'},{'name':   'Pass'},{'name':'Cam1'},{'name':'Mod1'},{'name':'2019-03-07'},   {'name':'2019-03-08'},{'name':'Mod2'},{'name':'2019-03-07'},   {'name':'2019-03-08'}]

在这里,问题在于整个文件夹结构丢失了(父子关系丢失了)。如何修改脚本,以便可以将列表呈现为文件夹结构?

1 个答案:

答案 0 :(得分:1)

这是使用递归的一种可能的解决方案:

from pprint import pprint
import xml.etree.ElementTree as ET

def walk(e):
    name = e.attrib['name']
    children = [walk(c) for c in e if e.tag == 'serverfiles']
    struct = {'name': name}
    if children:
        struct['children'] = children
    return struct

path_file = ET.parse(r'folder_structure.xml')
r = path_file.getroot()
s = walk(r)
pprint(s)

输出:

{'children': [{'children': [{'children': [{'children': [{'name': '2019-03-07'},
                                                        {'name': '2019-03-08'}],
                                           'name': 'Mod1'},
                                          {'children': [{'name': '2019-03-07'},
                                                        {'name': '2019-03-08'}],
                                           'name': 'Mod2'}],
                             'name': 'Cam1'}],
               'name': 'Fail'},
              {'children': [{'children': [{'children': [{'name': '2019-03-07'},
                                                        {'name': '2019-03-08'}],
                                           'name': 'Mod1'},
                                          {'children': [{'name': '2019-03-07'},
                                                        {'name': '2019-03-08'}],
                                           'name': 'Mod2'}],
                             'name': 'Cam1'}],
               'name': 'Pass'}],
 'name': 'Test'}

编辑::更新了代码以简化输出(基于注释):

from pprint import pprint
import xml.etree.ElementTree as ET

def walk(e):
    name = e.attrib['name']
    children = [walk(c) for c in e if e.tag == 'serverfiles']
    return {name: children} if children else name

path_file = ET.parse(r'folder_structure.xml')
r = path_file.getroot()
s = walk(r)
pprint(s)

输出:

{'Test': [{'Fail': [{'Cam1': [{'Mod1': ['2019-03-07', '2019-03-08']},
                              {'Mod2': ['2019-03-07', '2019-03-08']}]}]},
          {'Pass': [{'Cam1': [{'Mod1': ['2019-03-07', '2019-03-08']},
                              {'Mod2': ['2019-03-07', '2019-03-08']}]}]}]}

结构更简单,但是现在您必须考虑两种可能的类型-文件夹有子文件夹时为dict,如果它是叶节点(没有子文件夹)则为str